WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification °" : 
C12N 15/53, 15/82, A01H 5/00 



Al 



(11) International Publication Number: WO 98/56921 

(43) International Publication Date: 17 December 1998 (17.12.98) 



(21) International Application Number: PCT/US98/ 1 1 92 1 

(22) International Filing Date: 10 June 1998 (10.06.98) 



(30) Priority Data: 

60/049,752 



12 June 1997 (12.06.97) 



US 



(71) Applicant: DOW AGROSCIENCES LLC [US/US]; 9330 

Zionsville Road, Indianapolis, IN 46268 (US). 

(72) Inventors: AINLEY, Michael; 1474 Clearwater Court, Carmel, 

IN 46032 (US). ARMSTRONG, Katherine; 11202 Bridle- 
wood Trail, Zionsville, IN 46077 (US). BELMAR, Scott; 
7920 Pine Lake Road, Indianapolis, IN 46268 (US). FOLK- 
ERTS, Otto; 29 Arrowhead Drive, Guilford, CT 06437 
(US). HOPKINS, Nicole; 2518 Chaseway Court, Indianapo- 
lis, IN 46268 (US). MENKE, Michael, A.; 345. North 
Lesley Avenue, Indianapolis, IN 46219 (US). PAREDDY, 
Dayakar, 665 Woodbine Drive E., Carmel, IN 46033 (US). 
PETOLINO, Joseph, F.; 270 Woodstock Court, Zionsville, 
IN 46077 (US). SMITH, Kelley; 3445 E. County Road 700 
N., Lebanon, IN 46052 (US). WOOSLEY, Aaron; 8906 
Tanner Drive, Fishers, IN 46038 (US). 

(74) Agent: STUART, Donald, R.; Dow AgroSciences LLC, 9330 
Zionsville Road, Indianapolis, IN 46268 (US). 



(81) Designated States: AL, AM, AT, AU, AZ, BA, BB, BG, BR, 
BY, CA, CH, CN, CU, CZ, DE, DK, EE, ES, FI, GB, GE, 
GH, HU, IL, IS, JP, KE, KG, KR, KZ, LC, LK, LR, LS, 
LT, LU, LV, MD, MG, MK, MN, MW, MX, NO, NZ, PL, 
PT, RO, RU, SD, SE, SG, SI, SK, SL, TJ, TM, TR, TT, 
UA, UG, UZ, YU, ZW, ARIPO patent (GH, GM, KE, LS, 
MW, SD, SZ, UG, ZW), Eurasian patent (AM, AZ, BY, 
KG, KZ, MD, RU, TJ, TM), European patent (AT, BE, CH, 
CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, 
PT, SE), OAPI patent (BF, BJ, CF, CG, CI, CM, GA, GN, 
ML, MR, NE, SN, TD, TG). 



Published 

With international search report. 

Before the expiration of the time limit for amending the 
claims and to be republished in the event of the receipt of 
amendments. 



(54) Title: REGULATORY SEQUENCES FOR TRANSGENIC PLANTS 
(57) Abstract 

Regulatory sequences derived from a maize root preferential cationic peroxidase gene (Per5\ including the promoter, introns, and 
the 3' untranslated region, are useful to control expression of recombinant genes in plants. 



Duvick, et al. 

App. No. 10/047,825 



FOR THE PURPOSES OF INFORMATION ONLY 
Codes used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 


ES 
FI 


AM 


Armenia 


AT 


Austria 


FR 


AU 


Australia 


GA 


AZ 


Azerbaijan 


GB 


BA 


Bosnia and Herzegovina 


GE 


BB 


Barbados 


GH 


BE 


Belgium 


GN 


BF 


Burkina Faso 


GR 


BG 


Bulgaria 


HU 


BJ 


Benin 


IE 


BR 


Brazil 


IL 


BY 


Belarus 


IS 


CA 


Canada 


IT 


CF 


Central African Republic 


JP 


CG 


Congo 


KE 


CH 


Switzerland 


KG 


CI 


Cflte d'lvoire 


KP 


CM 


Cameroon 




CN 


China 


KR 


CU 


Cuba 


KZ 


CZ 


Czech Republic 


LC 


DE 


Germany 


U 


DK 


Denmark 


LK 


EE 


Estonia 


LR 



Spain 
Finland 
France 
Gabon 

United Kingdom 

Georgia 

Ghana 

Guinea 

Greece 

Hungary 

Ireland 

Israel 

Iceland 

Italy 

Japan 

Kenya 

Kyrgyzstan 

Democratic People's 

Republic of Korea 

Republic of Korea 

Kazakstan 

Saint Lucia 

Liechtenstein 

Sri Lanka 

Liberia 



LS 

LT 

LU 

LV 

MC 

MD 

MG 

MK 

ML 

MN 

MR 

MW 

MX 

NE 

NL 

NO 

NZ 

PL 

PT 

RO 

RU 

SD 

SE 

SG 



Lesotho 


SI 


Slovenia 


Lithuania 


SK 


Slovakia 


Luxembourg 


SN 


Senegal 


Latvia 


sz 


Swaziland 


Monaco 


TD 


Chad 


Republic of Moldova 


TG 


Togo 


Madagascar 


TJ 


Tajikistan 


The former Yugoslav 


TM 


Turkmenistan 


Republic of Macedonia 


TR 


Turkey 


Mali 


TT 


Trinidad and Tobago 


Mongolia 


UA 


Ukraine 


Mauritania 


UG 


Uganda 


Malawi 


US 


United States of America 


Mexico 


uz 


Uzbekistan 


Niger 


VN 


Viet Nam 


Netherlands 


YU 


Yugoslavia 


Norway 


ZW 


Zimbabwe 



. New Zealand 
Poland 
Portugal 
Romania 

Russian Federation 

Sudan 

Sweden 

Singapore 



BNSDOCID- <WO 9856921 A1J_> 



WO 98/56921 PCT/US98/11921 

REGULATORY SEQUENCES FOR TRANSGENIC PLANTS 
This invention relates to genetic engineering of plants. More particularly, the 
invention provides DNA sequences and constructs that are useful to control expression of 
recombinant genes in plants. Specific constructs of the invention use novel regulatory 
5 sequences derived from a maize root preferential cationic peroxidase gene. 

Through the use of recombinant DNA technology and genetic engineering, it has 
become possible to introduce desired DNA sequences into plant cells to allow for the 
expression of proteins of interest. However, obtaining desired levels of expression remains 
a challenge. To express agronomically important genes in crops at desired levels through 
10 genetic engineering requires the ability to control the regulatory mechanisms governing 
expression in plants, and this requires access to suitable regulatory sequences that can be 
coupled with the genes it is desired to express. 

A given project may require use of several different expression elements, for 
example one set to drive a selectable marker or reporter gene and another to drive the gene 
15 of interest. The selectable marker may not require the same expression level or pattern as 
that required for the gene of interest. Depending upon the particular project, there may be a 
need for constitutive expression, which directs transcription in most or all tissues at all 
times, or there may be a need for tissue specific expression. For example, a root specific 
or root preferential expression in maize would be highly desirable for use in expressing a 
20 protein toxic to pests that attack the roots of maize. 

Cells use a number of regulatory mechanisms to control which genes are expressed 
and the level at which they are expressed. Regulation can be transcriptional or post- 
transcriptional and can include, for example, mechanisms to enhance, limit, or prevent 
transcription of the DNA, as well as mechanisms that limit the life span of the mRNA after 
25 it is produced. The DNA sequences involved in these regulatory processes can be located 
upstream, downstream or even internally to the structural DNA sequences encoding the 
protein product of a gene. 

Initiation of transcription of a gene is regulated by a sequence, called the promoter, 
located upstream (5') of the coding sequence. Eukaryotic promoters generally contain a 
30 sequence with homology to the consensus 5'-TATAAT-3' (TATA box) about 10-35 base 
pairs (bp) upstream of the transcription start (CAP) site. Most maize genes have a TATA 
box 29 to 34 base pairs upstream of the CAP site. In most instances the TATA box is 
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required for accurate transcription initiation. Further upstream, often between -80 and - 
100, there can be a promoter element with homology to the consensus sequence CCAAT. 
This sequence is not well conserved in many species including maize. However, genes 
which have this sequence appear to be efficiently expressed. In plants the CCAAT "box" is 
5 sometimes replaced by the AGGA "box". Other sequences conferring tissue specificity, 
response to environmental signals or maximum efficiency of transcription may be found 
interspersed with these promoter elements or found further in the 5* direction from the CAP 
site. Such sequences are often found within 400 bp of the CAP site, but may extend as far 
as 1000 bp or more. 

10 Promoters can be classified into two general categories. "Constitutive" promoters 

are expressed in most tissues most of the time. Expression from a constitutive promoter is 
more or less at a steady state level throughout development. Genes encoding proteins with 
house-keeping functions are often driven by constitutive promoters. Examples of 
constitutivety expressed genes in maize include actin and ubiquitin. Wilmink et al. (1995). 

/ 5 "Regulated" promoters are typically expressed in only certain tissue types (tissue specific 
promoters) or at certain times during development (temporal promoters). Examples of 
tissue specific genes in maize include the zeins (Kriz et al., (1987)) which are abundant 
storage proteins found only in the endosperm of seed. Many genes in maize are regulated 
by promoters that are both tissue specific and temporal. 

20 It has been demonstrated that promoters can be used to control expression of 

foreign genes in transgenic plants in a manner similar to the expression pattern of the gene 
from which the promoter was originally derived. The most thoroughly characterized 
promoter tested with recombinant genes in plants has been the 35S promoter from the 
Cauliflower Mosaic Virus (CaMV) and its derivatives. U.S. Patent No. 5,352,065; Wilmink 

25 et a/.(1995); Datla et al. (1993). Elegant studies conducted by Benfey et al. (1984) reveal 
that the CaMV 35S promoter is modular in nature with regards to binding to transcription 
activators. U. S. Patent No. 5,097,025; Benfey et al. (1989) and (1990). Two independent 
domains result in the transcriptional activation that has been described by many as 
constitutive. The 35S promoter is very efficiently expressed in most dicots and is 

30 moderately expressed in monocots. The addition of enhancer elements to this promoter has 
increased expression levels in maize and other monocots. Constitutive promoters of 
monocot origin (that are not as well studied) include the polyubiquitin-1 promoter and the 
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rice actin-1 promoter. Wilmink et al (1995). In addition, a recombinant promoter, Emu, 
has been constructed and shown to drive expression in monocots in a constitutive manner, 
Wilmink etal. (1995). 

Few tissue specific promoters have been characterized in maize. The promoters 
5 from the zein gene and oleosin gene have been found to regulate GUS in a tissue specific 
manner. Kriz et al (1987); Lee and Huang (1994). No root specific promoters from 
maize have been described in the literature. However, promoters of this type have been 
characterized in other plant species. 

Despite both the important role of tissue specific promoters in plant development, 

10 and the opportunity that availability of a root preferential promoter would represent for 
plant biotechnology, relatively little work has yet been done on the regulation of gene 
expression in roots. Yamamoto reported the expression of E. colt uidA gene, encoding p- 
-glucuronidase (GUS), under control of the promoter of a tobacco (N. tabacum) root- 
specific gene, TobRB7. Yamamoto et al (1991), Conkling et al (1990). Root specific 

15 expression of the fusion genes was analyzed in transgenic tobacco. Significant expression 
was found in the root-tip meristem and vascular bundle. EPO Application Number 452 269 
(De Framond) teaches that promoters from metallathionein-like genes are able to function 
as promoters of tissue-preferential transcription of associated DNA sequences in plants, 
particularly in the roots. Specifically, a promoter from a metallathionein-like gene was 

20 operably linked to a GUS reporter gene and tobacco leaf disks were transformed. The 
promoter was shown to express in roots, leaves and stems. WO 91 13992 (Croy, et al) 
teaches that rape {Brassica napus L.) extensin gene promoters are capable of directing 
tissue-preferential transcription of associated DNA sequences in plants, particularly in the 
roots. Specifically, a rape extensin gene promoter was operably linked to a extA (extensin 

25 structural gene) and tobacco leaf disks were transformed. It was reported that northern 
analysis revealed no hybridization of an extensin probe to leaf RNA from either control or 
transformed tobacco plants and hybridization of the extensin probe to transgenic root RNA 
of all transformants tested, although the levels of hybridization varied for the transformants 
tested. While each of these promoters has shown some level of tissue-preferential gene 

30 expression in a dicot model system (tobacco), the specificity of these promoters, and 
expression patterns and levels resulting from activity of the promoters, has yet to be 
achieved in monocots, particularly maize. 
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DNA sequences called enhancer sequences have been identified which have been . 
shown to enhance gene expression when placed proximal to the promoter. Such sequences 
have been identified from viral, bacterial, and plant gene sources. An example of a well 
characterized enhancer sequence is the ocs sequence from the octopine synthase gene in 

5 Agrobacterium tumefaciens. This short (40 bp) sequence has been shown to increase gene 
expression in both dicots and monocots, including maize, by significant levels. Tandem 
repeats of this enhancer have been shown to increase expression of the GUS gene eight- 
fold in maize. It remains unclear how these enhancer sequences function. Presumably 
enhancers bind activator proteins and thereby facilitate the binding of RNA polymerase H 

10 to the TATA box. Grunstein (1992). WO95/14098 describes testing of various multiple 
combinations of the ocs enhancer and the mas (mannopine synthase) enhancer which 
resulted in several hundred fold increase in gene expression of the GUS gene in transgenic 
tobacco callus. 

The 5' untranslated leader sequence of mRNA, introns, and the 3' untranslated 
15 region of mRNA affect expression by their effect on post-transcription events, for example 
by facilitating translation or stabilizing mRNA. 

Expression of heterologous plant genes has also been improved by optimization of 
the non-translated leader sequence, i.e. the 5' end of the mRNA extending from the 5 1 CAP 
site to the AUG translation initiation codon of the mRNA. The leader plays a critical role 
20 in translation initiation and in regulation of gene expression. For most eukaryotic mRNAs, 
translation initiates with the binding of the CAP binding protein to the mRNA CAP. This 
is then followed by the binding of several other translation factors, as well as the 43S 
ribosome pre-initiation complex. This complex travels down the mRNA molecule while 
scanning for an AUG initiation codon in an appropriate sequence context. Once this has 
25 been found, and with the addition of the 60S ribosomal subunit, the complete 80S initiation 
complex initiates protein translation. Pain (1986); Kozak (1986). Optimization of the 
leader sequence for binding to the ribosome complex has been shown to increase gene 
expression as a direct result of improved translation initiation efficiency. Significant 
increases in gene expression have been produced by addition of leader sequences from 
30 plant viruses or heat shock genes. Raju et al (1993); Austin (1994) reported that the 
length of the 5' non-translated leader was important for gene expression in protoplasts. 
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In addition to the untranslated leader sequence, the region directly around the AUG 
start appears to play an important role in translation initiation. Luerhsen and Walbot 
(1994). Optimization of the 9 bases around the AUG start site to a Kozak consensus 
sequence was reported to improve transient gene expression 10-fold in BMS protoplasts. 
5 McElroy et al (1994). 

Studies characterizing the role of introns in the regulation of gene expression have 
shown that the first intron of the maize alcohol dehydrogenase gene (Adh-1) has the ability 
to increase expression under anaerobiosis. Callis et al (1987). The intron also stimulates 
expression (to a lesser degree) in the absence of anaerobiosis. This enhancement is thought 

JO to be a result of a stabilization of the pre-mRNA in the nucleus. Mascarenhas et al 

reported a 12-fold and 20-fold enhancement of CAT expression by use of the Adh-1 intron. 
Mascarenhas et al (1990). Several other introns have been identified from maize and other 
monocots which increase gene expression. Vain et al (1996). 

The 3* end of the mRNA can also have a large effect on expression, and is believed 

15 to interact with the 5' CAP. Sullivan (1993). The 3'untranslated region (3TJTR) has been 
shown to have a significant role in gene expression of several maize genes. Specifically, a 
200 base pair 3' sequence has been shown to be responsible for suppression of light 
induction of the maize small m3 subunit of the ribulose-l,5-biphosphate carboxylase gene 
(rbc/m3) in mesophyll cells. Viret et al (1994). Some 3' UTRs have been shown to 

20 contain elements that appear to be involved in instability of the transcript. Sullivan et al 
(1993). The 3'UTRs of most eukaryotic genes contain consensus sequences for 
polyadenylation. In plants, especially maize, this sequence is not very well conserved. The 
3* untranslated region, including a polyadenylation signal, derived from a nopaline synthase 
gene (3 f nos) is frequently used in plant genetic engineering. Few examples of 

25 heterologous 3'UTR testing in maize have been published. 

Important aspects of the present invention are based on the discovery that DNA 
sequences derived from a maize root specific cationic peroxidase gene are exceptionally 
useful for use in regulating expression of recombinant genes in plants. 

The peroxidases (dononhydrogen-peroxide oxidoreductase, EC 1.1 1.1.7) are highly 

30 catalytic enzymes with many potential substrates in the plant. See Gaspar, et al (1982). 
They have been implicated in such diverse functions as secondary cell wall biosynthesis, 
wound-healing, auxin catabolism, and defense of plants against pathogen attack. See 



PCTAJS98/11921 

WO 98/56921 

Lagrimini and Rothstein (1987); Morgens et al. (1990); Nakamura et al. (1988); Fujiyama 
et al. (1988); and Mazza et al. (1980). 

Most higher plants possess a number of different peroxidase isozymes whose 
pattern of expression is tissue specific, developmental^ regulated, and influenced by 
5 environmental factors. Lagrimini & Rothstein (1987). Based upon their isoelectric point, 
plant peroxidases are subdivided into three subgroups: anionic, moderately anionic, and 
cationic. 

The function of anionic peroxidase isozymes (pi, 3.5-4.0) is best understood. 
Isozymes from this group are usually cell wall associated. They display a high activity for 

10 polymerization of cinnamyl alcohols in vitro and have been shown to function in 

lignification and cross-linking of extensin monomers and feruloylated polysaccharides. 
Lagrimini and Rothstein (1987). In both potato and tomato, expression of anionic 
peroxidases have been shown to be induced upon both wound induction and abscisic acid 
treatment. Buffard et al. (1990). This suggests their involvement in both wound healing 

15 and in the regulation of tissue suberization. 

Moderately anionic peroxidase isozymes (pi, 4.5-6.5) are also cell wall associated 
and have some activity toward lignin precursors. In tobacco, isozymes of this class have 
been shown to be highly expressed in wounded stem tissue Fujiyama et al. (1988). These 
isozymes may also serve a function in suberization and wound healing. Morgens et al. 

20 (1990). 

The actual function of cationic peroxidase isozymes (pi, 8.1-11) in the plant 
remains unclear. Some members of this group, however, have been shown to efficiently 
catalyze the synthesis of H 2 0 2 from NADH and H 2 0. Others are localized to the central 
vacuole. In the absence of H 2 0 2 , some of these isozymes possess indoleacetic acid 
25 oxidase activity. Lagrimini and Rothstein (1987). 

Electrophoretic studies of maize peroxidases have revealed 13 major isozymes. 
Brewbaker et al (1985). All isozymes were judged to be functional as monomers, despite 
major differences in molecular weight. All maize tissues had more than one active 
peroxidase locus, and all loci were tissue-specific. The peroxidases have proved unique in 
30 that no maize tissue has been found without activity, and no peroxidase has proven 
expressed in all maize tissues. 
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Summary Of The Invention 

The invention provides isolated DNA molecules derived from the perS maize root 
preferential cationic peroxidase gene that can be used in recombinant constructs to control 
expression of genes in plants. More particularly, the invention provides isolated DNA 
5 molecules derived from the per 5 promoter sequence and having as at least a part of its 
sequence bp 4086-4148 of SEQ ID NO 1. Preferred embodiments are isolated DNA 
molecules that have as part of their sequences bp 4086 to 4200, bp 4086 to 4215, bp 3187 
to 4148, bp 3187 to 4200, bp 3187 to 4215, bp 2532-4148, bp 2532 to 4200, bp 2532 to 
4215, bp 1-4148, bp, bp 1-4200, or bp 1-4215 of SEQ ID NO 1. 

10 The invention also provides isolated DNA molecules selected from the following 

perS intron sequences: bp 4426-5058 , bp 4420-5064, bp 5251-5382, bp 5245-5388, bp 
5549-5649, and bp 5542-5654 of SEQ ID NO 1. 

The invention also provides isolated DNA molecules derived from the per5 
transcription termination sequence and having the sequence of bp 6068-6431 of SEQ ED 
15 NO 1. 

In another of its aspects, the present invention provides a recombinant gene cassette 
competent for effecting preferential expression of a gene of interest in a selected tissue of 
transformed maize, said gene cassette comprising: 

a) a promoter from a first maize gene, said first maize gene being one that is 
20 naturally expressed preferentially in the selected tissue; 

b) an untranslated leader sequence; 

c) the gene of interest, said gene being one other than said first maize gene; 

d) a 3'UTR; 

said promoter, untranslated sequence, gene of interest, and 3'UTR being operably linked 
25 from 5' to 3'; and 

e) an intron sequence that is incorporated in said untranslated leader sequence 
or in said gene of interest, said intron sequence being from an intron of a maize gene that is 
preferentially expressed in said selected tissue. 
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A related embodiment of the invention is a recombinant gene cassette competent for 
effecting constitutive expression of a gene of interest in transformed maize comprising: 
a) a promoter from a first maize gene, said first maize gene being one that is 

naturally expressed preferentially in a specific tissue; 
5 b) an untranslated leader sequence; 

c) the gene of interest, said gene being one other than said first maize gene; 

d) a 3*UTR; 

said promoter, untranslated sequence, gene of interest, and 3UTR being operably linked 
from 5' to 3'; and 

j 0 e ) an intron sequence that is incorporated in said untranslated leader or in said 

gene of interest, said intron sequence being from an intron of a maize gene that is naturally 

expressed constitutively. 

In a particular embodiment the intron is one from the maize Adhl expressed gene, 
and the resulting recombinant gene cassette provides constitutive expression in maize. 
I5 in another of its aspects, the invention provides DNA constructs comprising, 

operatively linked in the 5' to 3' direction, 

a) a promoter having as at least part of its sequence bp 4086-4148 bp of 

SEQIDNO 1; 

b) an untranslated leader sequence comprising bp 4 1 49-4200 of SEQ 

20 IDNOl, 

c) a gene of interest not naturally associated with said promoter, and 

d) a 3UTR. 

Preferred embodiments of this aspect of the invention are those wherein the promoter 
comprises bp 3187 to 4148, bp 2532-4148, or bp 1-4148 of SEQ ID NO 1. Particularly 
25 preferred are each of the preferred embodiments wherein said 3'UTR has the sequence of 
bp 6066-6340 or bp 6066-6439 of SEQ ID NO 1. 

In another of its aspects, the invention provides DNA constructs comprising, 

operatively linked in the 5* to 3' direction, 

a) a promoter having as at least part of its sequence bp 4086-4148 bp of 

30 SEQ IDNOl; 

b) an untranslated leader sequence not naturally associated with said 



promoter, 
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c) a gene of interest, 

d) a3UTR. 

Preferred embodiments of this aspect of the invention are those wherein the promoter 
comprises bp 3187 to 4148, bp 2532-4148, or bp 1-4148 of SEQ ID NO 1. Particularly 
5 preferred are each of the preferred embodiments wherein said 3 , UTR has the sequence of 
bp 6066-6340 or bp 6066-6439 of SEQ ID NO 1. 

In another of its aspects, the invention provides a DNA construct comprising, 
operatively linked in the 5' to 3' direction, 

a) a promoter having as at least a part of its sequence bp 4086-4148 bp 
10 ofSEQIDNO 1; 

b) an untranslated leader sequence comprising bp 4149-4200 of SEQ 
ID NO 1; 

c) an intron selected from the group consisting of an Adhl gene intron 
and bp 4426-5058 of SEQ ID NO 1 ; 

15 d) a gene of interest; and 

e) a3*UTR. 

Preferred embodiments of this aspect of the invention are again those wherein the promoter 
comprises bp 3187 to 4148, bp 2532-4148, or bp 1-4148 of SEQ ID NO 1. Particularly 
preferred are each of the preferred embodiments wherein said 3'UTR has the sequence of 
20 bp 6066-6340 or bp 6066-6439 of SEQ ID NO 1 . 

In another of its aspects, the invention provides a DNA construct comprising, in the 
5' to 3' direction, 

a) a promoter having as at least part of its sequence bp 4086-4148 bp of 

SEQ ID NO 1; 

25 b) an untranslated leader sequence; 

c) an intron selected from the group consisting of an Adhl gene intron 
and bp 4426-5058 of SEQ ID NO 1; 

d) a cloning site; 

e) a 3'UTR. 

30 In accordance with another significant aspect of the invention, there is provided a 

recombinant gene cassette comprised of the following operably linked sequences, from 5' 
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to 3': a promoter, an untranslated leader sequence; a gene of interest; and the perS 3UTR, 

bp 6068-643 1 of SEQ ID NO 1 . 

In another of its aspects, the invention provides a plasmid comprising a promoter 
having as at least part of its sequence bp 4086-4148 of SEQ ID NO 1. 
5 In another of its aspects, the invention provides a transformed plant comprising at 

least one plant cell that contains a DNA construct of the invention. The plant may be a 
monocot or dicot. Preferred plants are maize, rice, cotton and tobacco. 

In another of its aspects, the invention provides seed or grain that contains a DNA 
construct of the invention. 
]0 netted Descriptor) nf thp - Invention 

In one of its aspects, the present invention relates to regulatory sequences derived 
from the maize root preferential cationic peroxidase protein {per 5) that are able to regulate 
expression of associated DNA sequences in plants. More specifically, the invention 
provides novel promoter sequences and constructs using them. It also provides novel DNA 
15 constructs utilizing the per5 untranslated leader and/or 3TJTR. It also provides novel DNA 
constructs utilizing the introns from the per 5 gene. 

The DNA sequence for a 6550 bp fragment of the genomic clone of the maize root- 
preferential cationic peroxidase gene is given in SEQ ID NO 1. The sequence includes a 5' 
flanking region (nt 1-4200), of which nucleotides 4149-4200 correspond to the untranslated 
20 leader sequence. The coding sequence for the maize root-preferential cationic peroxidase is 
composed of four exons: exon 1 (nt 4201-4425), exon 2 (nt 5059-5250), exon 3 (nt 5383- 
5547), and exon 4 (nt 5649-6065). It should be noted that the first 96 nucleotides of exon 1 
(nt 4201-4296) code for a 32 amino acid signal peptide, which is excised from the 
polypeptide after translation to provide the mature protein. Three introns were found: 
25 intron 1 (nt 4426-5058), intron 2 (5251-5382), and intron 3 (5548-5648). The 3' flanking 
region (373 nucleotides in length) extends from nucleotide 6069 (after the UGA codon at 
nucleotides 6066-6068) to nucleotide 6550, including a polyadenylation signal at 

nucleotides 6307-6312. 

We have discovered that promoters derived from certain tissue preferential maize 
30 genes require the presence of an intron in the transcribed portion of the gene in order for 
them to provide effective expression in maize and that the temporal and tissue specificity 
observed depends on the intron used. A recombinant gene cassette having a tissue 
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preferential maize promoter, but lacking an intron in the transcribed portion of the gene, 
does not give appropriate expression in transformed maize. If the transcribed portion of the 
cassette includes an intron derived from a maize gene of similar tissue specificity to the 
maize gene from which the promoter was obtained, the gene cassette will restore tissue 
5 preferential expression in maize. The intron may be, but need not necessarily be, from the 
same gene as the promoter. If an intron derived from another maize gene, such as Adhl 
intron 1, is used in a gene cassette with a promoter from a tissue preferential maize gene, 
the cassette will give generally constitutive expression in maize. We have also found that 
these considerations apply to transgenic maize, but not to transgenic rice. Tissue 
10 preferential maize promoters can be used to drive recombinant genes in rice without an 
intron. 

In accordance with the foregoing unexpected and significant findings, the present 
invention provides a recombinant gene cassette competent for effecting preferential 
expression of a gene of interest in a selected tissue of transformed maize, said gene cassette 
15 comprising: 

a) a promoter from a first maize gene, said first maize gene being one that is 
naturally expressed preferentially in the selected tissue; 

b) an untranslated leader sequence; 

c) the gene of interest, said gene being one other than said first maize gene; 

20 d) a 3TJTR; 

said promoter, untranslated sequence, gene of interest, and 3 f UTR being operably linked 
from 5* to 3'; and 

e) an intron sequence that is incorporated in said untranslated leader sequence 
or in said gene of interest, said intron sequence being from an intron of a maize gene that is 
25 preferentially expressed in said selected tissue. 

The promoter used in this embodiment can be from any maize gene that is preferentially 
expressed in the tissue of interest. Such maize genes can be identified by conventional 
methods, for example, by techniques involving differential screening of mRNA sequences. 
A detailed example of identification and isolation of a tissue preferential maize gene 
30 is given herein for the root preferential maize cationic peroxidase gene. The method 
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illustrated in this example can be used to isolate additional genes from various maize 
tissues. 

Examples of tissue preferential maize genes that have promoters suitable for use in 
the invention include: O-methyl transferase and glutamine synthetase 1. 

A preferred promoter is the per5 promoter, i.e. the promoter from the root 
preferential maize cationic peroxidase gene. Particularly preferred is the promoter 
comprising bp 1 to 4215 of SEQ ID NO 1. 

The non-translated leader sequence can be derived from any suitable source and 
may be specifically modified to increase the translation of the mRNA. The 5' non- 
translated region may be obtained from the promoter selected to express the gene, the 
native leader sequence of the gene or coding region to be expressed, viral RNAs, suitable 
eukaryotic genes, or may be a synthetic sequence. 

The gene of interest may be any gene that it is desired to express in plants. 
Particularly useful genes are those that confer tolerance to herbicides, insects, or viruses, 
and genes that provide improved nutritional value or processing characteristics of the plant. 
Examples of suitable agronomically useful genes include the insecticidal gene from 
Bacillus thuringiensis for conferring insect resistance and the 5'-enolpyruvyl-3'- 
phosphoshikimate synthase (EPSPS) gene and any variant thereof for conferring tolerance 
to glyphosate herbicides. Other suitable genes are identified hereinafter. As is readily 
understood by those skilled in the art, any agronomically important gene conferring a 

desired trait can be used. 

The 3' UTR, or 3' untranslated region, that is employed is one that confers efficient 
processing of the mRNA, maintains stability of the message and directs the addition of 
adenosine ribonucleotides to the 3' end of the transcribed mRNA sequence. The 3' UTR 
may be native with the promoter region, native with the structural gene, or may be derived 
from another source. Suitable 3' UTRs include but are, not limited to: the per5 3' UTR, and 
the 3' UTR of the nopaline synthase (nos) gene. 

The intron used will depend on the particular tissue in which it is desired to 
preferentially express the gene of interest. For tissue preferential expression in maize, the 
intron should be selected from a maize gene that is naturally expressed preferentially in the 
selected tissue. 
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The intron must be incorporated into a transcribed region of the cassette. It is 
preferably incorporated into the untranslated leader 5' of the gene of interest and 3* of the 
promoter or within the translated region of the gene. 

Why certain tissue preferential maize genes require an intron to enable effective 
5 expression in maize tissues is not known, but experiments indicate that the critical event is 
post-transcriptional processing. Accordingly, the present invention requires that the intron 
be provided in a transcribed portion of the gene cassette. 

A related embodiment of the invention is a recombinant gene cassette competent for 
effecting constitutive expression of a gene of interest in transformed maize comprising: 
10 a) a promoter from a first maize gene, said first maize gene being one that is 

naturally expressed preferentially in a specific tissue; 

b) an untranslated leader sequence; 

c) the gene of interest, said gene being one other than said first maize gene; 

d) a 3'UTR; 

/ 5 said promoter, untranslated sequence, gene of interest, and 3*UTR being operably linked 
from 5 ! to 3'; and 

e) an intron sequence that is incorporated in said untranslated leader or in said 
gene of interest, said intron sequence being from an intron of a maize gene that is naturally 
expressed constitutively. 

20 This embodiment differs from the previous embodiment in that the intron is one from a 

gene expressed in most tissues, and the expression obtained from the resulting recombinant 
gene cassette in maize is constitutive. Suitable introns for use in this embodiment of the 
invention include Adhl intron 1, Ubiquitin intron 1, and Bronze 2 intron 1. Particularly 
preferred is the Adhl intron 1. Although it has previously been reported that the Adhl 

25 intron 1 is able to enhance expression of constitutively expressed genes, it has never been 
reported or suggested that the Adhl intron can alter the tissue preferential characteristics of 
a tissue preferential maize promoter. 

The present invention is generally applicable to the expression of structural genes in 
both monocotyledonous and dicotyledonous plants. This invention is particularly suitable 

30 for any member of the monocotyledonous (monocot ) plant family including, but not 
limited to, maize, rice, barley, oats, wheat, sorghum, rye, sugarcane, pineapple, yams, 
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onion, banana, coconut, and dates. A preferred application of the invention is in production 

of transgenic maize plants. 

This invention, utilizing a promoter constructed for monocots, is particularly 
applicable to the family Graminaceae, in particular to maize, wheat, rice, oat, barley and 
sorghum. 

In accordance with another aspect of the invention, there is provided a recombinant 
gene cassette comprised of: a promoter; an untranslated leader sequence; a gene of interest; 
and the per5 3'UTR. Use of the per 5 3'UTR provides enhanced expression compared to 
similar gene cassettes utilizing the nos 3'UTR. 

The promoter used with the perS 3'UTR can be any promoter suitable for use in 
plants. Suitable promoters can be obtained from a variety of sources, such as plants or 
plant DNA viruses. Preferred promoters are the per 5 promoter, the 35T promoter 
(described hereinafter in Examples 20 and 23), and the ubiquitin promoter. Useful 
promoters include those isolated from the caulimovirus group, such as the cauliflower 
mosaic virus 19S and 35S (CaMV19S and CaMV35S) transcript promoters. Other useful 
promoters include the enhanced CaMV35S promoter (eCaMV35S) as described by Kat et 
al. (1987) and the small subunit promoter of ribulose 1,5-bisphosphate carboxylase 
oxygenase (RUBISCO). Examples of other suitable promoters are rice actin gene 
promoter; cyclophilin promoter; Adhl gene promoter, Callis et al. (1987); Class I patatin 
promoter, Bevan et al. (1986); ADP glucose pyrophosphorylase promoter; .beta.- 
conglycinin promoter, Tiemey et al. (1987); E8 promoter, Deikman et al. (1988); 2 All 
promoter, Pear et al. (1989); acid chitinase promoter, Samac et al. (1 990). The promoter 
selected should be capable of causing sufficient expression of the desired protein alone, but 
especially when used with the per 5 3'UTR, to result in the production of an effective 
amount of the desired protein to cause the plant cells and plants regenerated therefrom to 
exhibit the properties which are phenotypically caused by the expressed protein. 

The untranslated leader used with the per5 3'UTR is not critical. The untranslated 
leader will typically be one that is naturally associated with the promoter. The untranslated 
leader may be one that has been modified in accordance with another aspect of the present 
invention to include an intron. It may also be a heterologous sequence, such as one 
provided by US Patent No. 5,362,865. This non-translated leader sequence can be derived 
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from any suitable source and can be specifically modified to increase translation of the 
mRNA. " 

The gene of interest may be any gene that it is desired to express in plants, as 
described above. 

5 The terms "per5 3'UTR" and/or ,f per5 transcription termination region" are intended 

to refer to a sequence comprising bp 6068 to 643 lof SEQ ID NO 1 . 

Construction of gene cassettes utilizing the per5 3'UTR is readily accomplished 
utilizing well known methods, such as those disclosed in Sambrook et al. (1989); and 
Ausubel et al (1987). 

10 As used in the present application, the terms "root-preferential promoter", "root- 

preferential expression", "tissue-preferential expression" and "preferential expression" are 
used to indicate that a given DNA sequence derived from the 5 ! flanking or upstream region 
of a plant gene of which the structural gene is expressed in the root tissue exclusively, or 
almost exclusively and not in the majority of other plant parts. This DNA sequence when 

15 connected to an open reading frame of a gene for a protein of known or unknown function 
causes some differential effect; i.e., that the transcription of the associated DNA sequences 
or the expression of a gene product is greater in some tissue, for example, the roots of a 
plant, than in some or all other tissues of the plant, for example, the seed. Expression of 
the product of the associated gene is indicated by any conventional RNA, cDNA, protein 

20 assay or biological assay, or that a given DNA sequence will demonstrate. 

This invention involves the construction of a recombinant DNA construct 
combining DNA sequences from the promoter of a maize root-preferential cationic 
peroxidase gene, a plant expressible structural gene (e.g. the GUS gene (Jefferson, (1987)) 
and a suitable terminator. 

25 The present invention also includes DNA sequences having substantial sequence 

homology with the specifically disclosed regulatory sequences, such that they are able to 
have the disclosed effect on expression. 

As used in the present application, the term "substantial sequence homology" is 
used to indicate that a nucleotide sequence (in the case of DNA or RNA) or an amino acid 

30 sequence (in the case of a protein or polypeptide) exhibits substantial, functional or 

structural equivalence with another nucleotide or amino acid sequence. Any functional or 
structural differences between sequences having substantial sequence homology will be de 
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minimis; that is they will not affect the ability of the sequence to function as-indicated in 
the present" application. For example, a sequence which has substantial sequence homology 
with a DNA sequence disclosed to be a root-preferential promoter will be able to direct the 
root-preferential expression of an associated DNA sequence. Sequences that have 

5 substantial sequence homology with the sequences disclosed herein are usually variants of 
the disclosed sequence, such as mutations, but may also be synthetic sequences. 

In most cases, sequences having 95% homology to the sequences specifically 
disclosed herein will function as equivalents, and in many cases considerably less 
homology, for example 75% or 80%, will be acceptable. Locating the parts of these 

JO sequences that are not critical may be time consuming, but is routine and well within the 
skill in the art. 

DNA encoding the maize root-preferential cationic peroxidase promoter may be 
prepared from chromosomal DNA or DNA of synthetic origin by using well-known 
techniques. Specifically comprehended as part of this invention are genomic DNA 
/ 5 sequences. Genomic DNA may be isolated by standard techniques. Sambrook et al. 

(1989); Mullis et al. (1987); Horton et al. (1989); Erlich (ed.)(1989). It is also possible to 
prepare synthetic sequences by oligonucleotide synthesis. See Caruthers (1983) and 
Beaucage et al. (1981). 

It is contemplated that sequences corresponding to the above noted sequences may 
20 contain one or more modifications in the sequences from the wild-type but will still render 
the respective elements comparable with respect to the teachings of this invention. For 
example, as noted above, fragments may be used. One may incorporate modifications into 
the isolated sequences including the addition, deletion, or nonconservative substitution of a 
limited number of various nucleotides or the conservative substitution of many nucleotides. 
25 Further, the construction of such DNA molecules can employ sources which have been 
shown to confer enhancement of expression of heterologous genes placed under their 
regulatory control. Exemplary techniques for modifying oligonucleotide sequences include 
using polynucleotide-mediated, site-directed mutagenesis. See Zoller et al. (1984); 
Higuchi et al. (1988); Ho et al. (1989); Horton et al. (1989); and PCR Technology, 
30 Principles and Applications for DNA Amplification , (ed.) Erlich (1989). 

In one embodiment, an expression cassette of this invention, will comprise, in the 5' 
to 3' direction, the maize root-preferential cationic peroxidase promoter sequence, in 
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reading frame, one or more nucleic acid sequences of interest followed by a franscript 
termination sequence. The expression cassette may be used in a variety of ways, including 
for example, insertion into a plant cell for the expression of the nucleic acid sequence of 
interest. 

5 The tissue-preferential promoter DNA sequences are preferably linked operably to a 

coding DNA sequence, for example, a DNA sequence which is transcribed into RNA, or 
which is ultimately expressed in the production of a protein product. 

A promoter DNA sequence is said to be "operably linked" to a coding DNA 
sequence if the two are situated such that the promoter DNA sequence influences the 

10 transcription of the coding DNA sequence. For example, if the coding DNA sequence 
codes for the production of a protein, the promoter DNA sequence would be operably 
linked to the coding DNA sequence if the promoter DNA sequence affects the expression 
of the protein product from the coding DNA sequence. For example, in a DNA sequence 
comprising a promoter DNA sequence physically attached to a coding DNA sequence in 

15 the same chimeric construct, the two sequences are likely to be operably linked. 

The DNA sequence associated with the regulatory or promoter DNA sequence may 
be heterologous or homologous, that is, the inserted genes may be from a plant of a 
different species than the recipient plant. In either case, the DNA sequences, vectors and 
plants of the present invention are useful for directing transcription of the associated DNA 

20 sequence so that the mRNA transcribed or the protein encoded by the associated DNA 

sequence is expressed in greater abundance in some plant tissue, such as the root, leaves or 
stem, than in the seed. Thus, the associated DNA sequence preferably may code for a 
protein that is desired to be expressed in a plant only in preferred tissue, such as the roots, 
leaves or stems, and not in the seed. 

25 Promoters are positioned 5' (upstream) to the genes that they control. As is known 

in the art, some variation in this distance can be accommodated without loss of promoter 
function. Similarly, the preferred positioning of a regulatory sequence element with respect 
to a heterologous gene to be placed under its control is defined by the positioning of the 
element in its natural setting, i.e., the genes from which it is derived. Again, as is known in 

30 the art and demonstrated herein with multiple copies of regulatory elements, some variation 
in this distance can occur. 
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Any plant-expressible structural gene can be used in these constructions. A 
structural gene is that portion of a gene comprising a DNA segment encoding a protein, 
polypeptide, antisense RNA or ribozyme or a portion thereof. The term can refer to copies 
of a structural gene naturally found within the cell, but artificially introduced, or the 
5 structural gene may encode a protein not normally found in the plant cell into which the 
gene is introduced, in which case it is termed a heterologous gene. 

The associated DNA sequence may code, for example, for proteins known to inhibit 
insects or plant pathogens such as fungi, bacteria and nematodes. These proteins include, 
but are not limited to, plant non-specific lipid acyl hydrolases, especially patatin; midgut- 

10 effective plant cystatins, especially potato papain inhibitor; magainins, Zasloff (1987); 

cecropins, Hultmark et al (1982); attacins, Hultmark et al (1983); melittin; gramicidin S, 
Katsu et al (1988); sodium channel proteins and synthetic fragments, Oiki et al (1988): 
the alpha toxin of Staphylococcus aureus, Tobkes et al. (1985); apolipoproteins and 
fragments thereof, Knott et al (1985)and Nakagawa et al (1985); alamethicin and a variety 

15 of synthetic amphipathic peptides, Kaiser et al (1987); lectins, Lis et al (1986) and Van 
Parijs et al (1991); pathogenesis-related proteins, Linthorst (1991); osmotins and 
permatins, Vigers et al (1992) and Woloscuk et al (1991); chitinases; glucanases, Lewah 
et al (1991); thionins, Bohlmann and Apel (1991); protease inhibitors, Ryan (1990); plant 
anti-mi crobial peptides, Cammue et al (1992); and polypeptides from Bacillus 

20 thuringiensis, which are postulated to generate small pores in the insect gut cell membrane, 
Knowles et al (1987) and Hofte and Whitely (1989). 

The structural gene sequence will generally be one which originates from a plant of 
a species different from that of the target organism. However, the present invention also 
contemplates the root preferential expression of structural genes which originates from a 

25 plant of the same species as that of the target plant but which are not natively expressed 
under control of the native root preferential cationic peroxidase (per5) promoter. 

The structural gene may be derived in whole or in part from a bacterial genome or 
episome, eukaryotic genomic, mitochondrial or plastid DNA, cDNA, viral DNA, or 
chemically synthesized DNA. It is possible that a structural gene may contain one or more 

30 modifications in either the coding or the untranslated regions which could affect the 
biological activity or the chemical structure of the expression product, the rate of 
expression, or the manner of expression control. Such modifications include, but are not 
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limited to, mutations, insertions, deletions, rearrangements and substitutions-of one or more - 
nucleotides. The structural gene may constitute an uninterrupted coding sequence or it may 
include one or more introns, bounded by the appropriate plant-functional splice junctions. 
The structural gene may be a composite of segments derived from a plurality of sources, 
5 naturally occurring or synthetic. The structural gene may also encode a fusion protein, so 
long as the experimental manipulations maintain functionality in the joining of the coding 
sequences. 

The use of a signal sequence to secrete or sequester in a selected organelle allows 
the protein to be in a metabolically inert location until released in the gut environment of an 
10 insect pathogen. Moreover, some proteins are accumulated to higher levels in transgenic 
plants when they are secreted from the cells, rather than stored in the cytosol. Hiatt, et al 
(1989). 

At the 3' terminus of the structural gene will be provided a termination sequence 
which is functional in plants. A wide variety of termination regions are available that may 

15 be obtained from genes capable of expression in plant hosts, e.g., bacterial, opine, viral, and 
plant genes. Suitable 3' UTRs include those that are known to those skilled in the art, such 
as the nos 3', tmL 3', or acp 3 f , for example. 

In preparing the constructs of this invention, the various DNA fragments may be 
manipulated, so as to provide for the DNA sequences in the proper orientation and, as 

20 appropriate, in the proper reading frame. Adapters or linkers may be employed for joining 
the DNA fragments or other manipulations may be involved to provide for convenient 
restriction sites, removal of superfluous DNA, removal of restriction sites, or the like. 

In carrying out the various steps, cloning is employed, so as to amplify a vector 
containing the promoter/gene of interest for subsequent introduction into the desired host 

25 cells. A wide variety of cloning vectors are available, where the cloning vector includes a 
replication system functional in E. coli and a marker which allows for selection of the 
transformed cells. Illustrative vectors include pBR322, pUC series, pACYC184, Bluescript 
series (Stratagene) etc. Thus, the sequence may be inserted into the vector at an appropriate 
restriction site(s), the resulting plasmid used to transform the E. coli host (e.g., E. coli 

30 strains HB101, JM101 and DH5cc), the £. coli grown in an appropriate nutrient medium 
and the cells harvested and lysed and the plasmid recovered. Analysis may involve 
sequence analysis, restriction analysis, electrophoresis, or the like. After each manipulation 
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the DNA sequence to be used in the final construct may be restricted and joined to the next 
sequence, where each of the partial constructs may be cloned in the same or different 
plasmids. 

Vectors are available or can be readily prepared for transformation of plant cells. In 

5 general, plasmid or viral vectors should contain all the DNA control sequences necessary 
for both maintenance and expression of a heterologous DNA sequence in a given host. 
Such control sequences generally include, in addition to the maize root-preferential cationic 
peroxidase promoter sequence (including a transcriptional start site), a leader sequence and 
a DNA sequence coding for translation start-signal codon (generally obtained from either 

10 the maize root-preferential cationic peroxidase gene or from the gene of interest to be 

expressed by the promoter or from a leader from a third gene which is known to work well 
or enhance expression in the selected host cell), a translation terminator codon, and a DNA 
sequence coding for a 3 ! non-translated region containing signals controlling messenger 
RNA processing. Selection of appropriate elements to optimize expression in any 

; 5 particular species is a matter of ordinary skill in the art utilizing the teachings of this 

disclosure; in some cases hybrid constructions are preferred, combining promoter elements 
upstream of the tissue preferential promoter TATA and CAAT box to a minimal 35S 
derived promoter consisting of the 35S TATA and CAAT box. Finally, the vectors should 
desirably have a marker gene that is capable of providing a phenotypical property which 

20 allows for identification of host cells containing the vector, and an intron in the 5 1 
untranslated region, e.g., intron 1 from the maize alcohol dehydrogenase gene that 
enhances the steady state levels of mRNA of the marker gene. 

The activity of the foreign gene inserted into plant cells is dependent upon the 
influence of endogenous plant DNA adjacent the insert. Generally, the insertion of 

25 heterologous genes appears to be random using any transformation technique; however, 
technology currently exists for producing plants with site specific recombination of DNA 
into plant cells (see WO/9109957). The particular methods used to transform such plant 
cells are not critical to this invention, nor are subsequent steps, such as regeneration of such 
plant cells, as necessary. Any method or combination of methods resulting in the 

30 expression of the desired sequence or sequences under the control of the promoter is 
acceptable. 
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Conventional technologies for introducing biological material into host cells 
include electroporation, as disclosed in Shigekawa and Dower (1988), Miller, et al (1988), 
and Powell, et al (1988); direct DNA uptake mechanisms, as disclosed in Mandel and Higa 
(1972) and Dityatkin, et al (1972), Wigler, et al (1979) and Uchimiya, et al (1982); 
5 fusion mechanisms, as disclosed in Uchidaz, et al (1980); infectious agents, as disclosed in 
Fraley, et al (1986) and Anderson (1984); microinjection mechanisms, as disclosed in 
Crossway, et al (1986); and high velocity projectile mechanisms, as disclosed in EPO 0 
405 696. 

Plant cells from monocotyledonous or dicotyledonous plants can be transformed 

10 according to the present invention. Monocotyledonous species include barley, wheat, 
maize, oat and sorghum and rice. Dicotyledonous species include tobacco, tomato, 
sunflower, cotton, sugarbeet, potato, lettuce, melon, soybean and canola (rapeseed). 

The appropriate procedure to transform a selected host cell may be chosen in 
accordance with the host cell used. Based on the experience to date, there appears to be 

15 little difference in the expression of genes, once inserted into cells, attributable to the 

method of transformation itself. Once introduced into the plant tissue, the expression of the 
structural gene may be assayed in a transient expression system, or it may be determined 
after selection for stable integration within the plant genome. 

Techniques are known for the in vitro culture of plant tissue, and in a number of 

20 cases, for regeneration into whole plants. The appropriate procedure to produce mature 
transgenic plants may be chosen in accordance with the plant species used. Regeneration 
varies from species to species of plants. Efficient regeneration will depend upon the 
medium, on the genotype and on the history of the culture. Once whole plants have been 
obtained, they can be sexually or clonally reproduced in such a manner that at least one 

25 copy of the sequence is present in the cells of the progeny of the reproduction. Seed from 
the regenerated plants can be collected for future use, and plants grown from this seed. 
Procedures for transferring the introduced gene from the originally transformed plant into 
commercially useful cultivars are known to those skilled in the art. 

Example 1 

30 Characterization Of A Maize Root-Pref erential Cationic Peroxidase 

The presence of peroxidase activity can be detected in situ in sodium dodecyl 

sulfate polyacrylamide gels (SDS-PAGE) by incubation with H2O2 and a chromogenic 

substrate such as 3,3'-diaminobenzidine. Tissue specific peroxidase activity was detected 
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by extraction of proteins from root, stem and leaf tissue of maize followed by detection in - 
gels according to Nakamura et al. (see Nakamura et al. (1988)) essentially as follows. One 
gram of maize tissue was macerated in mortar in 1 mL extraction buffer, composed of 62.5 
mM TrisHCl pH 6.8, 5 mM MgCl 2 , 0.5 M sucrose, and 0.1% ascorbic acid, centrifuged 
5 and passed over 0.2 uM filter to remove plant debris. Total protein was determined using 
the Bradford protein assay. See Bradford (1976). Ten micrograms of protein of each tissue 
, was electrophoresed on a SDS-poly acrylamide gel. Beta-mercaptoethanol was omitted 
from the sample buffer to retain enzyme activity. Following electrophoresis the gel was 
washed two times in 50 mM TrisHCl pH 7.5 for 30 minutes each to remove SDS, and then 
10 incubated in the assay solution, which was composed of 50 mM TrisHCl pH 7.5, 0.5 
mg/mL diamino benzidine and 0.01% hydrogen peroxide for 10 minutes. Bands 
corresponding to peroxidase activity were visualized by the formation of a brown 
precipitate. Non-reduced molecular weight markers (Amersham Corporation) were run in a 
parallel lane and visualized by standard protein staining in a separate incubation with 
15 Coomassie Brilliant Blue. Peroxidase activity in the gel corresponding to a band migrating ... 
at approximately 44 kD was only detected in root tissue and was not present in either leaf 
or stem tissue. Identical patterns of peroxidase staining were produced when several 
different maize genotypes were examined for root-specific peroxidase isozymes (B37 x 
H84, Pioneer Hybrid 3737, B73). 
20 fixample 2 

1201^ Of cDNA Clones Rnc n Hinp The Maize Boot-Preferential Catiomc 

Peroxidase 

A. PNA isolatio n r.DNA synthesis and library construction . 
Maize kernels (Zea mays hybrid B37 x H84) were germinated on filter paper under 
25 sterile conditions. At 6 days post germination root tissue was harvested and frozen in 
liquid nitrogen and ground in a mortar and pestle until a fine powder was obtained. The 
powder was added to 10 mLs of TLE buffer (0.2 M Tris HC1 pH 8.2, 0.1 M LiCl, 5 mM 
EDTA) containing 1% SDS and extracted with 50 mLs of TLE equilibrated phenol and 50 
mLs of chloroform. The extraction was incubated on ice for 45 minutes with shaking, and 
30 subsequently incubated at 50°C for 20 minutes. The aqueous phase was transferred to a 
clean centrifuge tube following centrifugation, and reextracted twice with one half volume 
of phenol/chloroform (1:1), followed by extractions with chloroform. RNA was 
precipitated from the aqueous phase by addition of one third volume of 8 M LiCl and 
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incubation at 4°C for 24 hrs. The precipitate was collected by centrifugation, washed with _ 
2M LiCfand resuspended in 12 mLs of water. RNA was reprecipitated by addition of 
an equal volume of 4 M LiCl, incubation at 4°C for 24 hrs and centrifugation. The RNA. 
pellet was resuspended in 2 mL of water and ethanol precipitated by addition of 200 ul 3 M 

5 Na Acetate and 5.5 mL of ethanol and 1 6 hr incubation at -20°C, followed by 

centrifugation. The final RNA pellet was resuspended in 1 mL water. The concentration 
of the RNA was determined using measurement of the absorption at 260 nm. Messenger 
RNA was purified by binding to and subsequent elution of polyA Quickkit™ columns 
exactly as described by the supplier (Stratagene Cloning Systems, La Jolla, CA). The 

10 concentration was determined by A260 measurement. cDNA was synthesized from 5 
micrograms of poly A+ RNA using the ZAP-cDNA® synthesis kit, cloned into the Uni- 
ZAP® vector, packaged into phage heads using Stratagene Gigapack Gold® packaging 
extracts and infected and amplified on E. coli strain PLK-F exactly according to the 
protocols provided by the supplier (Stratagene). The titer of the resulting amplified library 

/ 5 was determined by plating on PLK-F' cells and was determined at 2.7 x 10 9 plaque forming 
units (pfu)/mL. 

B. Isolation of a peroxidase hybridizati on probe. A hybridization probe 
corresponding to a central portion of peroxidase cDNA sequences was isolated as follows. 
Sequence analysis of a number of cloned peroxidases indicated that there are several 

20 domains in the predicted and/or determined amino acid sequences that are highly 

conserved. See Lagrimini and Rothstein (1987). Two degenerate oligonucleotide primers 
were synthesized against two conserved domains, taking in account a bias for C or G over 
A or T in the third codon position in maize. Part of the first conserved domain, 
FHDCFVNGC corresponding to amino acids 41 through 49 of the tobacco peroxidase (see 

2 5 Lagrimini and Rothstein ( 1 987)) was reverse translated into the degenerate oligonucleotide 
MM1: 5'-TTYCAYGAYTGYTTYGTYAAYGGBTG-3' (SEQ ID NO 3). Part of a second 
conserved domain, VALSGAHT (corresponding to amino acids 161 through 168 of the 
tobacco peroxidase (see Lagrimini and Rothstein ( 1 987)) was reverse translated and reverse 
complemented to give the degenerate oligonucleotide MM3: 5'- 

30 SGTRTGSGCSCCGSWSAGVGCSAC-3' (SEQ ID NO 4). In both oligonucleotides, Y 
indicates the degeneracy C and T; R indicates A and G, S indicates C and G; W indicates A 
and T; V indicates A, C, and G; and B indicates C ,G, and T; 
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Using the Polymerase Chain Reaction™ kit (Perkin Elmer Cetus) a 380 bp DNA 
fragment was amplified using total root cDNA library DNA as template. The size of this 
fragment corresponded well to the expected size based on the distance of the two domains 
in peroxidase proteins, 128 amino acids corresponding to 384 nt. Following gel 

5 purification the 380 nt fragment was radiolabeled using random primer labeling with an 
Oligo Labeling™ kit (Pharmacia LKB Biotechnology, Inc, Piscataway, NJ) as per the 
supplier's instructions with [Di]50 microCuries [a- 32 P}dCTP. 

C SglgSOing of the mnt r.DNA library. Two hundred thousand phages were 
plated on E. coli XL1 Blue cells (Stratagene) divided over ten plates. Duplicate plaque lift 

10 filters were made of each plate. Filters were prehybridized and hybridized in a total 

volume of 150 mLs of hybridization solution according to standard procedures (Sambrook 
et al 1989). The approximate concentration of labelled probe in the hybridization was 2.20 
x 10 5 cpm/mL. Following hybridization filters were washed according to standard 
procedures, air dried, covered and exposed to Kodak XAR5 film. Signals were determined 

15 positive if they occurred in the same position on the two duplicate filters of one plate 

relative to the markings. Putative positive phage were cored out of the plate and stored in 1 
mL of SM buffer. Thirty four positive phage were rescreened twice to obtain a pure phage 
stock using similar hybridization experiments as described above. DNA from all 34 
positive phage cDNA clones was prepared by alkaline lysis minipreps following in vivo 

20 rescue of phagemids according to the protocol provided by the supplier (Stratagene) and 
digested with EcoRl andXhol to release inserts. All plasmids contained one insert in the 
size range of 1 .3-1 .4 kb which hybridized with the 380 nt peroxidase probe. 

Example 3 

AH Y g if Qfjnaizs roo t -preferential cationic pgaxjdass cDNA done per5. 
25 A. Analysis of expression patte rn hv Northern hybridization. RNA was 

prepared from root, stem, leaf, kernel and tassel tissue as described in Example 2, section 
A. Thirty micrograms of denatured total RNA of each tissue was electrophoresed on a 1% 
agarose/Na phosphate gel and transferred to nylon membrane and prehybridized and 
hybridized with the labeled 380 nt peroxidase probe according to standard procedures. A 
30 ~1 470 nt transcript was detected in root and stem RNA, but was absent from leaf, kernel 
and tassel RNA. The level of the detected transcript in roots was at least 5.5 fold higher 
than in stem tissue. 



-24- 



WO 98/56921 PCT/US98/1 1921 

B. Se quence anal ysis of the per5 cDNA clone. Both strands of-dsDNA from 
the cDNA clone with the longest insert (per5) were sequenced using the Sequenase™ 
sequencing kit (United States Biochemical, Cleveland, OH). Sequencing was started using 
the T3 and T7 primers and completed by walking along the DNA using sequencing primers 

5 designed based on sequence derived in previous runs. The sequence of the peri cDNA 
insert is shown in SEQ ID NO 5. The per5 cDNA insert is 1354 nucleotides (nt) in length 
and has a 5'-untranslated leader of 52 nt and a 275 nt 3* untranslated sequence before the 
start of polyadenylation. It also contains the animal consensus polyadenylation signal 
sequence AATAAA 34 nucleotides prior to the addition of a 28 nucleotide poly(A) tail 

10 The cDNA has an open reading frame of 999 bp, which spans between nucleotides 53 and 
1051. The first ATG codon in the cDNA sequence was chosen as the start of translation. 
The predicted size of the mature maize peroxidase is 301 amino acids with a MW of 32,432 
and an estimated pi of 9.09. The N-terminus of the mature protein was assigned by 
alignment of the maize amino acid sequence with other published sequences and known N- 

15 terminal sequences obtained by N-termal amino acid sequencing. It is predicted from the 
cDNA sequence that the protein is initially synthesized as a preprotein of MW 35,685 with 
a 32-amino acid signal sequence that is 72% hydrophobic. The presence of this signal 
sequence, which has also been observed in several other plant peroxidases, suggests that 
the protein is taken up in the endoplasmic reticulum and modified for sub-cellular targeting 

20 or secretion. This is supported by the presence of four potential N-glycosylation sites 
(Asn-Xaa-Thr/Ser), which are at residues 53, 138, 181 and 279 of the putative mature 
protein. The presence of four putative N-glycosylation sites suggest a role for post- 
radiational modification (eg. glycosylation) and explains the discrepancy in the observed 
(-44 kD) and predicted size of the mature protein (-36 kD). Comparison of the deduced 

25 amino acid sequences of the maize per 5 cDNA with the published sequences of wheat (see 
Hertig et al (1991)), horseradish [CI] (see Fujiyama et al (1988)), turnip [TP7] (see 
Mazza and Welinder (1980)), peanut [PNC1] (see Buffard et al (1990)), tobacco (see. 
Lagrimini et al (1987)), and cucumber (see Morgens et al (1990)) confirms that per5 
encodes a peroxidase protein. There is >80% to >92% sequence similarity between these 

30 seven plant peroxidases in four conserved domains. All seven peroxidases have eight 
cysteines, conserved in position in the primary sequence. These cysteines in the 
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horseradish and tumip enzymes have bean shown to be involved in intramolecular disulfide . 
linkages. 

Example 4 

TcQiatiQn of the maizg^oflLjasfeisatial cationjc pejaxidass gsspomi^lans 

5 A . r^m\r. DNA Bi nt Hvhridization. Genomic DNA was isolated from a 

maize diploid, homozygous line (B73). The DNA was digested with the restriction 
enzymes EcoKL, HindSL, and Sad, fractionated on a 1% agarose gel, subjected to transfer 
to membrane and hybridization to both a 32p.iabeledpe.-5 full-length cDNA and zperS 
cDNA gene-specific probe (GSP5). The 136 bp GSP5 probe was amplified by PCR using 

,0 IheperS cDNA clone as template DNA and primers MM21: 5'-GTCATAGAACTGTGGG 
-3'(SEQ ID NO 6); andMM22: 5'-ATAACATAGTACAGCG-3' (SEQ ID NO 7). This 
probe is composed of nt 25- 160 of theper.5 cDNA clone and includes 27 bp of the 5' 
untranslated sequence, the entire coding sequence for the putative endoplasmic reticulum 
signal peptide and 7 bp which code for the amino-terminus of the putative per5 mature 

15 domain. 

Using the per5 cDNA full length probe two strong hybridization signals were 
detected in each digest. This suggested that theperJ gene may be present in two copies per 
haploid genome. However, using GSP5 as a probe only one band per lane was detected 
which suggested that there is only one copy of the peri gene per haploid genome and that 
20 the other hybridizing band on the genomic DNA blot corresponds to more distantly related 
sequences. This also demonstrated that probe GSP5 was gene specific and would be 
suitable for the isolation of the peroxidase genomic clone from a maize genomic library. 

B isolation of the ™nt- P referen tia1 ratinnic peroxidase gene from a maize W22 
library . Approximately 2 x 10* plaques of a maize W22 genomic library (Clontech 
Laboratories, Inc., Palo Alto, CA) were screened using GSP5 as the probe according to 
standard protocol for library screening. GSP5 was used as probe because it would 
recognize only the genomic clones corresponding to the per5 cDNA clone. Ten genomic 
clones were isolated and plaque purified. The clones were plate amplified to increase their 
titers, liquid lysates were grown up and phage DNA was isolated from these cultures. 
30 Restriction analysis on nine of the ten clones using Sail, which liberates the genomic DNA 
inserts from the phage arms, showed that eight of the nine clones had the same Sail 
banding pattern. These eight clones contained -14.9 Kb inserts which could be cut into 
two Sail fragments of -10.4 Kb and -4.5 Kb, respectively. The ninth clone (perGEN19) 
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contained an -15.6 Kb insert which upon Sail digestion yields two fragments, -13.1 Kb 
and -2.5 Kb in size. Restriction and DNA hybridization analysis suggest that perGEN19 
contains an insert which overlaps with the Sau3A inserts of the other 8 clones. A 
representative of the eight identical genomic clones (perGENl) was further analyzed. The 

5 -10.4 Kb fragment was subcloned into the Sail site of the plasmid pBluescript®II SK(-) 
(Stratagene, Inc.) generating plasmid perGENl(10.44). Restriction digests (using Apal 9 
BamHl, EcoBl, HindUI, Kpnl, Ncol, Sad, and Xbal) and DNA blot hybridization analyses 
(using either the full-length per 5 cDNA or GSP5 as probes) indicated that the 10.44 Kb 
Sail fragment on perGENl contained the peroxidase sequences. Further restriction digests 

10 using single and double digests of Hindlll, Kpnl, Sacl, and Xbal and DNA blot 

hybridization analyses using gel-purified Kpnl perGENl (10.44) fragments as probes was 
performed on perGENl (10.44). 

Example 5 

Se quence of the maize root-preferential cationic peroxidase gene 

15 A total of 6550 nt of genomic sequence covering the maize root-preferential 

cationic peroxidase gene and its 5 ! and 3' flanking sequences was obtained by sequencing 
overlapping subfragments of plasmid perGENl (10.44) which hybridized with the GSP5 
probe described in Example 3 as well as the per5 cDNA insert. The sequence is shown in 
SEQ ID NO 1. The sequencing procedures were standard techniques known to those 

20 skilled in the art. The upstream flanking region from the 5'-most Ncol site to the putative 
start site of translation was determined to be 4200 nt in length. The maize root-preferential 
cationic peroxidase gene is composed of exons: exon 1 (225 bp), exon 2 (192 bp), exon 3 
(166 bp), and exon 4 (416 bp). The GC-content of the exons is 54.7%. The sequence of 
the compiled exon sequences was 100% identical to that of the coding region for the per 5 

25 cDNA. Translation of these exons resulted in a deduced protein sequence that is 100% 

identical to the deduced protein sequence for the per 5 cDNA sequence. Three introns were 
found: intron 1 (633 bp, %AU = 62.7, %U = 33.8), intron 2 (132 bp, %AU = 63.6, %U = 
35.6), and intron 3 (101 bp, %AU = 65.3, %U = 37.6). The downstream flanking region 
from the UGA codon to the 3' most Xbal site was found to be 373 bp in length. The intron 

30 splice sites did not fit the putative monocot 5' and 3' splice site consensus sequences 
perfectly, but did follow the mammalian "GU/AG rule" for splice sites. The intron 
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sequences also conformed to the definition of maize intron sequences suggested by Walbot. 
SeeWalbot et al (1991). 

Fx ample 6 
pDAB 406 

This Example describes pDAB 406, a vector designed for testing of promoter 
activity in both transient and stable transformation experiments. Hie complete sequence 
for pDAB 406 is given in SEQ ID NO 8. With reference to SEQ ID NO 8, significant 

features of pDAB 406 are given in Table 1. 

Table 1 : Features of pDAB 406 



nt (SEQ ID 
NO 8) 



1-6 



7-24 



25-30 
32-1840 



1841-1883 



1894-1899 



1900-2168 



2174-2179 



2180-2185 



2186-2932 



2195-2446 



2455-2801 



2814-2932 



2933-2938 



2933-3023 
3024-3141 



3150-3187 



3188-3193 



3190-4842 



4907-5165 



5172-5177 



5178-5183 



5186-5191 



5195-5672 



5680-6034 



6042-7021 



6054-6848 



7022-7726 



7727-7732 



Features 



Apal site 



multiple cloning site (Mel, Kpnl, Smal) 



Sail site 



E. coli uidA reporter gene encoding the beta-glucuronidase protein 
(GUS) from pKA882 and TGA stop codon — 



3' untranslated region from pBI221 



Sst\ site 



nooaline synthetase 3' polyA sequence {nos 3'UTR) 



HindlTL site 



Bgill site 



a modified CaMV 35S promoter 



MCASTRAS nt 7093-7344 



MCASTRAS nt 7093-7439 



Synthetic Maize Streak Virus (MSV) untranslated leader containing 
the maize Adhl intron 1 . 



BzlWIBcR junction 



AdhLS nt 269-359 MZEADH1.S 



Adh I .S nt 704-821 MZEADH1.S 



BamKVBglll junction 



synthetic MSV leader containing the maize Adh I intron 1 



Ncol m - — r- 

mternal reference gene composed of t he firefly luciferase gene (Lux) 
nopaline synthetase 3' polyA sequence {nos 3'UTR) _ 



Bgtll site 



Ndel site 



55/1 site 



nt 6972-6495 MCASTRAS (CaMV 35S promoter)^ 



nt 7089-7443 MCASTRAS (CaMV 35S promoter) 
Tn5 nt 1539-2518; mutated 2X 



a selectable marker gene composed of the bacterial NPTIJ gene 
encoding neomycin phosphotransferase which provides resistance to 
the antibiotics kanamycin, neomycin and G418 



7733-7914 
7915-10148 



10149-10160 



ne antibiotics Kanamycm, ncuniyun ouu ^ 

y UTR of ORF26 gene Agrobacterium tumijaciens Ti plasmid (pTi 
15955, nt 22438 to 21726) . . 



Ndel site 



p(JC19 nt 1-182, reverse complement 



nt 453 to 2686 pUC19, reverse complement 



multiple cloning site, Hindllh Sstl 
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The vector can readily be assembled by those skilled in the art using- well known 
methods. 

Example 7 
pDAB4U 

This Example describes plasmid pDAB 411, which is a 1 1 784 bp plasmid that has a 
pUC19 backbone and contains a gene cassette comprising 1.6kb oiperS promoter, the per5 
untranslated leader, the GUS gene, and the nos 3' UTR. No intron is present in the 
untranslated leader of pDAB 411. The complete sequence for pDAB 41 1 is given in SEQ 
ID NO 9. With reference to SEQ ID NO 9, significant features of pDAB 41 1 are given in 
Table 2. 

Table 2 . Significant Features of pDAB411 



nt(SEQ ID 
NO 9) 


Feature 


1-6 


A pal site 


7-1648 


Per5 promoter and untranslated leader sequence (corresponding to 
nt 2559 to 4200 of SEQ ID NO 1) 


1649-1654 


Sail site 


1656-3464 


E. coli uidA reporter gene encoding the beta-glucuronidase protein 
(GUS) 


3465-3507 


3' untranslated region from pBI221 


3518-3523 


Ssil site 


3524-3792 


nopaline synthetase 3' polyA sequence (nos 3'UTR) j 


3793-11784 


corresponds to 2169 to 10160 of pDAB 406 SEQ ID NO 8 



appreciable GUS expression. This failure is consistent with our discovery that certain 
tissue preferential maize promoters require the presence of an intron in the transcribed 
portion of the gene for significant expression to be observed. 

Example 8 
pDAB 419 

This Example describes construction of Plasmid pDAB 419, which is a 1 1991 bp 
plasmid that is identical to pDAB 411, except that the untranslated leader preceding the 
GUS gene includes a 207 bp sequence comprising a deleted version the maize Adhl intron 
1 . The complete sequence for pDAB 419 is given in SEQ ID NO 10. With reference to 
SEQ ID NO 10, critical features of pDAB 419 are as follows: 
Table 3: Critical Features of pDAB 419 



nt (SEQ ID 


Feature | 


NO 10) 
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1-6 


Apal site 


7-1648 


Per5 promoter and untranslated leader sequence 
(corresponding to nt 2559 to 4200 of SEQ ID NO 

n 


1649-1855 


deleted version of maize AdhJ intron 1 
corresponding to nt 2939-3145 of SEQ ID NO 8 


1856-1861 


Sail site 


1863-3671 


E. coli uidA reporter gene encoding the beta- 
glucuronidase protein (GUS) 


3672-3714 


3 untranslated region irom p&uii 


3725-3730 


Sstl site 


3731-3999 


nopaline synthetase 3' polyA sequence {nos 
3'UTR) 


4000-11991 


corresponds to 2169 to 10160 of pDAB 406 SEQ 
ID NO 8 



10 



15 



20 



techniques. More specifically, the per5 promoter in plasmid pDAB41 1 was amplified with 
primers MM88: 5'-ACGTACGTACGGGCCCACCACTGTTGTAACT TGTAAGCC-3' 
(SEQ ID NO 1 1) and OF192: 5' AGGCGGACCTTTGCACTGTGA GTTACCTTCGC- 
3'(SEQ ID NO 12). The modified /WW intron 1, corresponding to nt 2939 to 3145 of SEQ 
ID NO 8. was amplified from plasmid pDAB406 using primers OF190: 5'- 
CTCTGTCGACGAGCGCAGCTGCAC GGGTC-3'(SEQ ID NO 13) and OF191: 5 1 - 
GCGAAGGTAACTCACAGTGCA AAGGTCCGCCT-3' (SEQ ID NO 14). Following 
amplification both fragments were purified through a 1% agarose gel. Splice Overlap 
Extension PCR was used to join the per5 promoter fragment to the Adhl intron 1 fragment. 
Samples (2.5 uL) of each gel-purified fragment were mixed and re-amplified using primers 
MM88 and OF192 (SEQ ID NOS 1 1 and 12). The resulting 1.6 kB per5adh fragment was 
digested with Apal and Sail, gel-purified, and ligated into pDAB406 which was digested 
withal and Sail resulting in an 11,991 bp plasmid, pDAE-419. 

Example 9 
Transform^ " f with nDAB 419 

This example describes transformation of rice with pDAB 419, and the 
histochemical and quantitative patterns of GUS expression in the transformed rice plants. 

,a , Tr^genir. Production. 

1 Plant Material ™ H r.allm Culture. For initiation of embryogenic callus, mature 
seeds of a Japonica cultivar, Taipei 309 were dehusked and surface-sterilized in 70% 
ethanol for 2-5 min. followed by a 30-45 min soak in 50% commercial bleach (2.6% 
sodium hypochlorite) with a few drops of 'Liquinox' soap. The seeds were then rinsed 3 
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times in sterile distilled water and placed on filter paper before transferring -to 'induction* 
media (NB). The NB medium consisted of N6 macro elements (Chu, 1978), B5 micro 
elements and vitamins (Gamborg et al, 1968), 300 mg/L casein hydrolysate, 500 mg/L L- 
proline, 500 mg/L L-glutamine, 30 g/L sucrose, 2 mg/L 2,4-dichloro-phenoxyacetic acid 

5 (2,4-D), and 2.5 g/L Gelrite (Schweizerhall, NJ) with a pH adjusted to 5.8. The mature seed 
cultured on 'induction' media were incubated in the dark at 28° C. After 3 weeks of culture, 
the emerging primary callus induced from the scutellar region of mature embryo was 
transferred to fresh NB medium for further maintenance. 

2. Plasmids and DNA Precipitation . pDAB354 containing 351-hpt (hygromycin 

10 phosphotransferase providing resistance to the antibiotic hygromycin; (described in 

Example 25) was used in ^transformations with pDAB 419. About 140 ng of DNA was 
precipitated onto 60 mg of gold particles. The plasmid DNA was precipitated onto 1.5-3.0 
micron (Aldrich Chemical Co., Milwaukee, WI) or 1.0 micron (Bio-Rad) gold particles. 
The precipitation mixture included 60 mg of pre-washed gold particles, 300 ^L of 

15 water/DNA (140 ng), 74 nL of 2.5 M CaCl 2 , and 30 \iL of 0.1 M spermidine. After adding 
the components in the above order, the mixture was vortexed immediately, and allowed to 
settle for 2-3 min. Then, the supernatant was pipetted off and discarded. The DNA-coated 
gold particles were resuspended in 1 mL of 100% ethanol and diluted to 17.5 fig DNA/7.5 
mg gold per mL of ethanol for use in blasting experiments. 

20 3. Helium Blasting into Em hrvogenic Callus and Selection. Actively growing 

embryogenic callus cultures, 2-4 mm in size, were subjected to a high osmoticum 
treatment. This treatment included placing of callus on NB medium with 0.2 M mannitol 
and 0.2 M sorbitol (Vain et ai, 1993) for 4 hrs before helium blasting. Following 
osmoticum treatment, callus cultures were transferred to 'blasting' medium (NB+2% agar) 

25 and covered with a stainless steel screen (230 micron). Helium blasting involved 

accelerating the suspended DNA-coated gold particles towards and into the prepared tissue 
targets. The device used was an earlier prototype to the one described in US Patent 
#5,141,131 which is incorporated herein by reference, although both function in a similar 
manner. The callus cultures were blasted at different helium pressures (1,750-2,250 psi) 

30 once or twice per target. After blasting, callus was transferred back to the media with high 
osmoticum overnight before placing on selection medium, which consisted of NB medium 
with 30 mg/L hygromycin. After 2 weeks, the cultures were transferred to fresh selection 
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medium with higher concentrations of selection agent, i.e., NB+50mg/L hy^omycin (Liet . 
ai, 1993). 

a RggeneaaoiL Compact, white-yellow, embryogenic callus cultures, recovered 
on NB+50 mg/L hygromycin, were regenerated by transferring to 'pre-regeneration' (PR) 

5 medium + 50 mg/L hygromycin. The PR medium consisted of NB medium with 2 mg/L 6- 
benzylaminopurine (BAP), 1 mg/L naphthaleneacetic acid (NAA), and 5 mg/L abscisic 
acid (ABA). After 2 weeks of culture in the dark, they were transferred to 'regeneration 1 
(RN) medium. The composition of RN medium is NB medium with 3 mg/L BAP, and 0.5 
mg/L NAA. The cultures on RN medium were incubated for 2 weeks at 28° C under high 

10 fluorescent light (325-ft-candles). The plantlets with 2 cm shoot were transferred to 1/2 
MS medium (Murashige and Skoog, 1962) with 1/2 B5 vitamins, 10 g/L sucrose, 0.05 
mg/L NAA, 50 mg/L hygromycin and 2.5 g/L Gelrite adjusted to P H 5.8 in magenta boxes. 
When plantlets were established with well-developed root system, they were transferred to 
soil (1 metromix: 1 top soil) and raised in a growth chamber or greenhouse (29/24°C 

15 day/night cycle, 50-60% humidity, 12 h photoperiod) until maturity. A total of 23 
hygromycin-resistant callus lines were established. 
R. GUS histnr.hemic al assays 

GUS histochemical assays were conducted according to Jefferson (1987). Tissues 
were placed in 24-well microtitre plates (Coming, New York, NY) containing 500 uL of 

20 assay buffer per well. The assay buffer consisted of 0.1 M sodium phosphate (pH 8.0), 0.5 
mM potassium ferricyanide, 0.5 mM potassium ferrocyanide, 10 mM sodium EDTA, 1.9 
mM 5-bromo-4-chloro-3-indolyl-beta-D-glucuronide, and 0.06% triton X-100. The plates 
were incubated in the dark for 1-2 days at 37° C before observations under a microscope. 
Fourteen of the 23 hygromycin resistant rice lines expressed the GUS gene as evidenced by 

25 blue staining after 48 hours in the GUS histochemical assay. Nine of the 14 GUS 
expressing lines were further characterized (Table 4). 
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Table 4: Histochemical GUS Staining of Transgenic Rice Callus 



Line 


Rating 


354/419-03 


i i ti- 


354/419-04 


ll M 


354/419-07 


++-H- 


354/419-11 


+++ 


354/419-12 


++ 


354/419-13 


+++ 


354/419-15 


-H- 


354/419-18 


+++ ; 


354/419-21 


++ 



+ = Occasional blue region 

++ = Light blue staining throughout 

+++ = Dark blue regions 



5 ++++ = Intense blue staining throughout 

C. Southern Analysis 

Southern analysis was used to identify primary regenerate (Ro) plant lines from rice 
that contained an intact copy of the transgene and to measure the complexity of the 
integration event. Several leaves from each rice plant were harvested and up to five plants 

10 were sampled individually from each line. Genomic DNA from the rice Ro plants was 
prepared from lyophilized tissue as described by Saghai-Maroof et al (1984). Eight 
micrograms of each DNA was digested with the restriction enzyme Xbal using conditions 
suggested by the manufacturer (Bethesda Research Laboratory, Gaithersburg, MD) and 
separated by agarose gel electrophoresis. The DNA was blotted onto nylon membrane as 

15 described by Southern (1975, 1980). 

A probe specific for (3-glucuronidase (GUS) coding region was excised from the 
pDAB419 plasmid using the restriction enzymes Ncol and 55/1. The resulting 1 .9 kb 
fragment was purified with the Qiaex II DNA purification kit (Qiagen Inc., Chatsworth, 
CA). The probe was prepared using an oligo-labeling kit (Pharmacia LKB, Piscataway, 

20 NJ) with 50 microcuries of a 32 P-dCTP (Amersham Life Science, Arlington Heights, IL). 
The GUS probe hybridized to the genomic DNA on the blots. The blots were washed at 
60°C in 0.25X SSC and 0.2% SDS for 45 minutes, blotted dry and exposed to XAR-5 film 
overnight with two intensifying screens. 

D. GUS Quantification 
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] jj ssag Pr oration. Histochemically GUS positive plantlets, grown in magenta . 
boxes, wire dissected into root and leaf tissues. Duplicate samples of approximately 300 
mg root and 100 mg leaf were transferred to a 1.5 ml sterile sample tube (Kontes, Vineland, 
NJ) and placed on ice prior to freezing at -80°C. Extraction of proteins consisted of 
5 grinding tissue using a stainless steel Kontes Pellet Pestle powered by a 0.35 amp, 40 Watt 
motor (Model 102, Rae Corp., McHenry, IL), at a setting of "40". GUS Lysis buffer from 
the GUS-Light™ assay kit (Tropix, Bedford, MA) was modified with the addition of 20% 
glycerol to produce the extraction buffer. Before grinding, frozen samples were placed 
on ice and aliquots of 100 ul extraction buffer were added to the sample tube. Tissue was 
,0 homogenized in approximately four 25-second intervals during which additional aliquots of 
extraction buffer were added for a final volume of 300 ul for root and 200 ul for leaf 
tissues. Samples were maintained on ice until all sample grinding was completed. 
Samples were then centrifuged twice at 5°C for 8 minutes at full speed (Eppendorf 
Centrifuge Model 5415). Supernatant was transferred to sterile microcentrifuge tubes on 
j 5 ice and later used to quantitate proteins and GUS; the pellet was discarded. 

o T^ ^^inOnPntification. Quantification of extractable proteins was 
determined with the Bio-Rad Protein Assay kit (Bio-Rad Laboratories, Hercules, CA). A 
protein standard made from bovine albumin (Sigma, St. Louis, MO) was used to obtain a 
standard curve from zero to 10 ug/ml. Duplicate samples for each tissue were prepared 
20 using 5 ul of protein extract with 5 ul GUS lysis buffer in a sterilized microcentrifuge tube. 
Water was added to bring the volume up to 800 ul before 200 ul dye reagent was added. 
Tubes were vortexed, then incubated at room temperature for at least 5 minutes before the 
liquid was transferred into 1 .5 ml cuvetts and place in the spectrophotometer (Shimadzu, 
Japan). Absorbance measurements were made at 595 nm. 
25 I nns notification. Analysis of GUS activity required the use of the GUS- 

Light™ assay kit and an automatic luminescence photometer (Model 1251 Luminometer 
and Model 1291 Dispenser, Bio-Orbit, Finland). For each sample, a relative level of GUS 
activity was measured on lul extract. From the initial reading, sample volumes were 
scaled up between 2 and 10 ul of extract per luminometer vial while remaining within the 
30 detection limits of the equipment. Samples were prepared in triplicate to which 180 ul 
aliquots of GUS-Light™ reaction buffer was added to each luminometer vial at 10-second 
intervals. After a one hour incubation at room temperature in the dark, the vials were 
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loaded into the sample holder of the luminometer. As each vial entered the-measuring 
chamber,~300 ^1 of GUS-Light™ Light Emission Accelerator Buffer was added and 
luminescence was detected over a 5-second integration period. A "blank reaction" was 
included in the assay, using 10 fil of the GUS extraction buffer. A GUS standard, prepared 
5 to read 8,000 relative light units (RLU) from commercially available P-glucuronidase 
(Sigma, MO), was used to confirm the sensitivity of the equipment and reagents used. 
GUS readings (RLU) were corrected for the "blank" and the GUS standard readings before 
dividing by jjig total protein. 



Table 5: GUS Expression in Rice Plants Tansformed with pDAB 419 



Line 


Presence of 

Intact 
Construct 


Number of 
Hybridization 
Products 


Relative light units per mg protein 








Root 


Leaf 


354/419-03 


yes 


10 


n.d. 


n.d. 


354/419-04 


yes 


4 


795 


579 


354/419-07 


yes 


1 


22341 


23407 


354/419-11 


n.d 


n.d. 


1077 


215 


354/419-12 


n.d. 


n.d. 


n.d. 


n.d. 


354/419-13 


yes 


9 


736 


346 


354/419-15 


yes 


2 


208 


208 


354/419-18 


yes 


7 


230 


62 


354/419-21 


yes 


3 


186 


56 



10 n.d = not determined 

Rice plants regenerated from transgenic callus stained positively for GUS in both 
roots and leaves indicating constitutive expression. It was not expected that constitutive 
expression of GUS would be observed from the pDAB419 construct because of the lack of 
expression in the leaves of the native per5 gene in maize. 
15 Example 10 

Transformation of Mafce with pPAB 4)9 
A. Establishment of Type II Callus Targets. 

Two parents of 'High IT (Armstrong and Phillips, (1991)) were crossed and when 
the developing embryos reached a size of 1.0-3.0 mm (10-14 days after pollination), the ear 
20 was excised and surface sterilized. Briefly, ears were washed with Liquinox soap 

(Alconox, Inc., NY) and subjected to immersions in 70% ethanol for 2-5 minutes and 20% 
commercial bleach (0.1% sodium hypochlorite) for 30-45 minutes followed by 3 rinses in 
sterile, distilled water. Immature embryos were isolated and used to produce Type II 
callus. 
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For Type II callus production, immature embryos were placed (scutellum-side up) . 
onto the surface of 'initiation' medium (ISAglO) which included N6 basal salts and 
vitamins (Chu, 1978), 20 g/L sucrose, 2.9 g/L L-proline, 100 mg/L enzymatic casein 
• hydrolysate (ECH), 37 mg/L Fe-EDTA, 10 mg/L silver nitrate, 1 mg/L 2,4-dichloro- 
5 phenoxyacetic acid (2,4-D), and 2.5 g/L Gelrite (Schweizerhall, NJ) with P H adjusted to 
5.8. After 2-3 weeks incubation in the dark at 28°C, soft, friable callus with numerous 
globular and elongated somatic embryo-like structures (Type II) were selected. After 2-3 
subcultures on the 'initiation' medium, callus was transferred to 'maintenance' medium 
(#4). The 'maintenance' medium differed from the 'initiation' medium in that it contained 
,0 690 mg/L L-proline and no silver nitrate. Type II callus was used for transformation 
experiments after about 16-20 weeks. 

u ^P.lium F 1"°m Election. 

pDAB367 (Example 27) and pDAB419 were co-precipitated onto the surface of 
1 5-3.0 micron gold particles (Aldrich Chem. Co., Milwaukee, WI). P DAB367 contains a 
15 phosphinothricin acetyl transferase gene fusion which encodes ^stance to the herbicide 
Basta ™ This gene is used to select stable transgenic events. The precipitation mixture 
included 60 mg of pre-washed gold particles, 140 ug of plasmid DNA (70 ^g of each) in 
300 ,L of sterile water, 74 ^L of 2.5 M CaCl 2 , and 30 uL of 0.1 M spermidine. After 
adding the components in the above order, the mixture was vortexed immediately, and 
20 allowed to settle for 2-3 minutes. The supernatant was removed and discarded and the 
plasmid/gold particles were resuspended in 1 mL of 100% ethanol and diluted to 7.5 mg 
plasmid/gold particles per mL of ethanol just prior to blasting. 

Approximately 400-600 mg of Type II callus was placed onto the surface of #4 
medium with 36.4 g/L sorbitol and 36.4 g/L M mannitol for 4 hours. In preparation for 
25 blasting, the callus was transferred to #4 medium with 2% agar (JRH Biosciences, Lenexa, 
KS) and covered with a stainless steel screen (104 micron). Helium blasting was 
completed using the same device descnbed in Example 9. Each callus sample was blasted 
a total of four times. After blasting the callus was returned to #4 medium with 36.4 g/L 
sorbitol and 36.4 g/L mannitol for 18-24 hours after which it was transferred to 'selection' 
30 m edium(#4mediumwith30mg^Basta™andnoECHorL-proline). Thecalluswas 
transferred to fresh 'selection' medium every four weeks for about three months. After 8- 
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12 weeks, actively growing transgenic colonies were isolated and sub-cultured every two 
weeks on'fresh 'selection' medium to bulk-up callus for regeneration. 
C. Histochemical GUS Assay, 

Basta™-resistant callus was analyzed for GUS expression by incubating a 50 mg 
5 sample in 1 50 (iL of assay buffer for 48 hours at 37°C. The assay buffer consisted of 0.2 M 
sodium phosphate pH 8.0, 0.5 mM each of potassium ferricyanide and potassium 
ferrocyanide, 10 mM sodium EDTA, 1.9 mM 5-bromo-4-chloro-3-indolyl-b-D- 
glucuronide, and 0.06% v/v Triton x-100 (Jefferson et aL, 1987). Transgenic callus 
expressing the GUS gene turned blue. A total of 1 7 Basta™-resistant callus lines were 
JO established for maize, with three maize lines expressing the GUS gene as evidenced by 
blue staining after 48 hours in the GUS histochemical assay. 

Table 6. Histochemical GUS Staining of 
Transgenic Maize Callus 



Line 


rating 


311/419-01 


+ 


311/419-02 


+++ 


311/419-16 


+++ 



15 + = Occasional blue region 

++ = Light blue staining throughout 
+++ = Dark blue regions 
++++ = Intense blue staining throughout 
There was considerable variability in intensity of staining among the expressing 

20 callus ranging from very intense to somewhat spotty (Table 6). Generally, callus staining 

was more intense in rice than in maize. 

D. Plant Regeneration. 

GUS-expressing callus was transferred to induction' medium and incubated at 
28°C, 16/8 light/dark photoperiod in low light (13 mE/m 2 /sec) for one week followed by 

25 one week in high light (40 mE/m 2 /sec) provided by cool white fluorescent lamps. The 
'induction' medium was composed of MS salts and vitamins (Murashige and Skoog 
(1962)), 30 g/L sucrose, 100 mg/L myo-inositol, 5 mg/L 6-benzylamino purine, 0.025 
mg/L 2,4-D, 2.5 g/L Gelrite (Schweizerhall, NJ) adjusted to pH 5.7. Following this two- 
week induction period, the callus was transferred to 'regeneration' medium and incubated 

30 in high light (40 mE/m 2 /sec) at 28°C. The 'regeneration' medium was composed of MS 
salts and vitamins, 30 g/L sucrose, and 2.5 g/L Gelrite (Schweizerhall, NJ) adjusted to pH 
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5.7. The callus was sub-cultured to fresh 'regeneration' medium every two-weeks until 
plantlets appeared. Both 'induction' and 'regeneration' medium contained 30 mg/L 
Basta™. Plantlets were transferred to 10 cm pots containing approximately 0.1 kg of dry 
Metro-Mix (The Scotts Company, Marysville, OH), moistened thoroughly, and covered 
5 with clear plastic cups for approximately 4 days. At the 3-5 leaf stage, plants were 
transplanted to 5-gallon pots and grown to maturity. 
E, g™'th«rn Analysis 

A DNA probe specific for the ^-glucuronidase (GUS) coding region was excised 
from the pDAB41 8 plasmid using the restriction enzymes Ncol and Sstl. The 1 .9 kb 
10 fragment was purified with the Qiaex II DNA purification kit (Qiagen Inc., Chatsworth, 
CA). The probe was prepared using an oligo-labeling kit (Pharmacia LKB, Piscataway, 
NJ) with 50 microcuries of a 32 P -dCTP (Amersham Life Science, Arlington Heights, IL). 
Southern analysis was used to identify maize callus material that contained an intact copy 
of the transgene and to measure the complexity of the integration event. The callus material 
IS was removed from the media, soaked in distilled water for 30 minutes and transferred to a 
new petri dish, prior to lyophilization. Genomic DNA from the callus was prepared from 
lyophilized tissue as described by Saghai-Maroof et al. (1984). Eight micrograms of each 
DNA was digested with the restriction enzyme Xbal using conditions suggested by the 
manufacturer (Bethesda Research Laboratory, Gaithersburg, MD) and separated by agarose 
20 gel electrophoresis. The DNA was blotted onto nylon membrane as described by Southern 
(1975, 1980). The GUS probe was hybridized to the genomic DNA on the blots. The blots 
were washed at 60<>C in 0.25X SSC and 0.2% SDS for 45 minutes, blotted dry and exposed 
to XAR-5 film overnight with two intensifying screens. 

F st^ti 0 " f E n plants f or T Tnifnrm Expression. 
25 The 6th leaf was collected from five or six "Ve-equivalent" stage plants (because of 

inability of determining exact leaf number from R0 plants, a plant characteristic of the V6 
stage was used). The entire leaf was removed, cut into pieces and stored in a plastic bag at - 
70°C until further processing. Leaves were powdered in liquid nitrogen and tissues 
samples representing approximately 400 uL of tissue were placed in microfuge tubes. The 
30 tissue was either stored or extracted immediately. GUS was extracted by mixing the 

powdered tissue with GUS Lysis Buffer (Jefferson, 1987) as modified by the addition of 
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1% polyvinylpyrrolidone (hydrated in the buffer for at least one hour), 20% glycerol, 50 
mg/mL antipain, 50 mg/mL leupeptin, 0.1 mM chymostatin, 5 mg/mL pepstatin and 0.24 
mg/mL Pefabloc™ (Boehringer Mannheim, Indianapolis, IN). After incubation on ice for 
at least 10 min, the samples were centrifuged at 16,000g for 10 min. The supernatants were 
5 recovered and centrifuged a second time as described above. The supernatants were 

recovered and frozen on dry ice and stored at -70°C Experiments showed that GUS 
activity was stable for at least 4 freeze-thaw cycles when stored in the buffer described 
above. GUS activity was measured using a GUS-Light™ kit (Tropix, Inc, Bedford, MA). 
Five jaL samples of undiluted extract or of extract diluted so that the luminescence was 

10 within the range measured by the luminometer was added to 195 ^1 of the GUS-Light™ 
Reaction Buffer. After 1 hr the luminescence was measured using a BioOrbit 1251 
luminometer equipped with a BioOrbit 1291 injector after injection of 300 nL of GUS- 
Light™ Accelerator. Luminescence was integrated for 5 sec after a 5 sec delay. Protein 
was measured with the assay developed by Bradford (1976) using human serum albumin as 

15 the standard. 

G. Organ-S pecific Expression Quantitative Analyses, 

Plants grown in the greenhouse in 5 gallon pots were harvested to determine organ- 
specificity of GUS expression. Prior to harvesting tissue from V6-equivalent plants, roots 
were cut approximately one inch from the side of the pot to remove any dead root tissue. 
20 Roots from VT stage (mature) plants were washed and any dead root tissue was removed 

before freezing at -70°C. Leaves, stems (VT-stage plants only) and roots were harvested 

and either frozen at -70°C or powdered in liquid nitrogen immediately. Experiments 
showed that GUS is stable in frozen tissue. After powdering the tissues, three aliquots of 
approximately 10 ml of tissue were collected into preweighed tubes, and the tubes with 

25 tissue weighed and stored at -70°C. Tissue was extracted in the same buffer as described 
above except protease inhibitors were only added to aliquots of the extracts instead to the 
entire extract volume. For extraction, the powdered tissues were thawed into 4 ml buffer/g 
tissue and homogenized for 5-10 sec at 8,000 rpm using a Ultra-Turrax T 25 (IKA- Works, 

Inc.) homogenizer with an 18 mm probe. The samples were centrifuged at 4°C for 5 min at 
30 201 5g. After removing the supernatants, the pellets were extracted again but with 2 ml 
buffer/g tissue and the supernatant after centrifugation was pooled with the supernatant 
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from the first extraction. The pellet was extracted again with 2 ml/g tissue^ the supernatant . 
after cenfrifugation was processed separately from the pooled supematants from the first 
two extractions. GUS activity recovered in the final extract was used to determine 
extraction efficiency of the first two extractions. GUS and protein assays were done as 
5 described above for both sets of supematants. Roots at each node from V7 plants grown in 
approximately 15 gallon pots were analyzed separately as described above. 
H pistnr.hem K?' ^n alvSRS StaininP ofMai/,e Tissues, 

Histochemical analyses of ptrSadh/GVS/nos gene expression was done essentially 
as described by Jefferson (1987). Roots were first treated 1 h at 37°C in 100 mM NaP0 4 
10 buffer, pH 7.0, 10 mM EDTA, 0.1% Triton X-100 and 10 mM 13-mercaptoethanol. The 
root sections were washed 3 times with the same buffer but without 6-mercaptoethanol and 
then incubated lhr in the same buffer at 37°C. GUS histochemical assay buffer Jefferson 
(1987) was added and the tissues were incubated for various times at 37°C. Roots from V6 
and VT plants were removed from each node and treated separately. Roots from each node 
15 of V6 plants were measured, cut into 6 equal parts, and 2-one centimeter pieces were 

removed from the ends of each root section. One root piece from each section was stained 
until the ends were blue; the other piece from each section was stained overnight. Roots 
from VT plants were stained similarly, but two roots from each node, if available, were cut 
into several pieces and stained together. One root from each node was stained until the 
20 roots turned blue; the other root from each node was stained overnight. One intact leaf was 
removed from the bottom, middle and top of the V6 and VT plants and analyzed. The 
leaves were cut lengthwise. The leaf half containing the midrib was transversely cut at 
intervals across the midrib and along the outer edge of the leaves. The leaves were vacuum 
infiltrated with GUS histochemical assay buffer and incubated at 37°C until stained regions 
25 were visible. Chlorophyll was removed by incubation in 70% ethanol at room temperature. 
Pieces of stems that included a node and adjacent intemodal regions were cut from the 
bottom, middle and top sections of VT plants. Cross sections of the intemodal regions and 
longitudinal sections that included the node and intemodal regions above and below the 
node were stained. One longitudinal and one cross sectional piece of each stem region 
30 analyzed was stained until blue was visible; another set of stem pieces was stained 
overnight. After staining, the stem pieces were placed in 70% alcohol to remove 
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chlorophyll. Pollen was collected from transgenic per5adh/GXJS/nos plants for 2 hr from 
tassels from which all extruded anthers were removed. Pollen was stained overnight. 
Kernels were analyzed 20 days post-pollination from crosses done in which the transgenic 
plant was the male parent and from crosses in which the transgenic plant was the female 
5 parent. The kernels were dissected longitudinally through the embryo. 
T. Screening of R fl Plants for Uniform Expression. 

To define the spatial and temporal expression patterns of a promoter of interest, the 
expression pattern of a transgene must not be affected by its chromosomal location. 
Evidence suggests that transgene expression can be "silenced" non-uniformly in different 

10 parts of plants, resulting in spatial and temporal expression patterns that do not represent 
the true promoter activity in transgenic plants. Gene silencing often occurs stochastically, 
occurring to different extents in individuals within a population (reviewed by Matzke et al 
(1993)). All transformation events were screened for uniform expression among five or six 
R 0 plants for each event (Table 7), thus eliminating transformation events that display 

15 silencing of the transgene in a population of this size. GUS expression among R$ plants 
analyzed for each of three transformation events reported here were statistically 
indistinguishable. 

Tab!e.7 Expression of GUS with pDAB 419 in Individual Ro Plants in Three Transformati n 



Events 



TRANSFORMATION EVENTS 


308/419-01 3 




419-02 




419-16 


Relative Light 
Units/mg 
Protein 


Standard 
Deviation** 




Relative Light 
Units/mg 
Protein 


Standard 
Deviation** 




Relative Light 
Units/mg 
Protein 


Standard 
Deviation^ 


24973 


853 




5261 


562 




1011 


97 


23811 


641 




4537 


381 




1039 


14 


29747 






5055 


573 




1213 


9 


24081 


614 




5743 


137 




942 


12 


25729 


199 




4645 


315 




1367 




27025 












1282 


46 



20 a only one sample was analyzed for some of the 308/419-01 plants 

^standard deviations were determined from independent analyses of two aliquots of tissue from each plant 

J. Quantitat ive Analyses of pDAB 419 Maize Plants. 

Quantitative analyses of GUS activity was done at two stages of corn development: 

25 V6 (whorl stage) and VT (tassel emergence). Entire leaf, stem or root samples were 
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10 



15 



powdered and duplicate aliquots were analyzed. GUS activity was determined relative to 
either extracted protein concentration or to fresh weight of tissue. The high percent 
recovery of GUS activity indicates extraction procedure for GUS is efficient (Tables 8 and 
9). The 308/419-01 and 419-02 plants are BCi (crossed consecutively with the same 
inbred twice) and R 0 generations, respectively. ThepoSoA promoter is expressed in 
root, stem (VT plants) and leaf tissue (Tables 8 and 9). When normalized to extractable 
protein, roots express higher levels of GUS than leaves in V6 and VT plants; stem 
accumulates GUS at levels higher than either leaves or roots in VT plants (Tables 8 and 9). 
GUS expression normalized to fresh weight of tissue and expression normalized to 
extractable protein levels follow similar trends of organ-specificity of expression in VT 
plants, although the relative proportions of expression among the organs are different. In 
V6 plants, the ptrSadh promoter expresses GUS at similar levels in leaves and roots based 
on fresh weight of tissue, but the promoter clearly expresses GUS higher in roots than in 
leaves when expression is normalized to extractable protein. 



Plant Organ 


Relative 
Light 

Units/mg 
Protein 


Standard 
Deviation a 


Relative Light 
Units/g 
Tissue 
(-1000) 


Standard 
Deviation a 


Average 
Percent 
Extraction 

Efficiency 0 


308/419-02 












leaves 


5,518 


155 


39,687 


4,231 


86.8 


roots 


15,496 


2,918 


33,155 


7,620 


91.1 


419-02 












leaves 


3,256 


111 


23,367 


1,704 


85.8 




8,871 


35 


14,316 


333 

r , \: ^ „C 


89.3 



Standard deviations were determined irom inaepenuem <uia.y=>~. *» . -i 

Waction efficiency was percent recovery of GUS activity in the first two extractions relative to the total 
GUS activity in all three extractions of the tissues 



20 
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Table 9. Expressi n f Per5adh/GUS/«»s in VT Transgenic Plant Organs 



Plant Oroan 

i loin v_/lgcui 


IxCIallVC 


O loll vial u 


XVClaUVC 


oldHUaTU 


Average 




Light 


Deviation 3 


Lieht Units/e 


Deviation* 


r ci ceil i 




Units/mg 




Tissue 




Extraction 




Protein 




(-1000) 




Efficiency 0 


308/4 1 9-02 












leaves 


2,915 


• 177 


30,426 


1,567 


87.3 


stem 


15,701 


837 


35,601 


593 


85.2 


roots 


10,197 


351 


15,393 


310 • 


82.8 


419-02 












leaves 


2,319 


15 


18,112 


1,305 


86.7 


stem 


14,721 


165 


32,619 


747 


84.0 


roots 


3,923 


734 


6,473 


814 


83.1. 



a standard deviations were determined from independent analyses of two aliquots of tissue from each sample 

^extraction efficiency was percent recovery of GUS activity in the first two extractions relative to the total 
GUS activity in all three extractions of the tissues 

The perSadh promoter activity was examined in detail in roots. For these 
experiments, 308/419-01 plants were grown in 15 gallon pots to improve root quality. 
Roots at all nodes express GUS, but the GUS activity/mg extractable protein increases in 
nodes 3-5 relative to expression in nodes 1 and 2 (Table 10). 



Table 10. Expression of GUS with pDAB 419 in Transgenic Plant Root Nodes 



Root Node 


Relative Light Units/mg 
Protein 


Standard Deviation 3 


node 1 


5,479 




node 2 


4,268 


297.5 


node 3 


6,836 


47.3 


node 4 


8,148 


92.6 


node 5 


10,887 


305.9 



a standard deviations were determined from independent analyses of two aliquots of tissue from each sample; 
only one sample was available for node 1 

K. Histochemical Analyses of pDAB 419 Maize Plants. 

The perSadh promoter expresses GUS to levels that are detectable in all tissues 

tested using the histochemical staining procedure of Jefferson (1987) with the exception of 

kernels (but only when the transgenic plant is used as a pollen donor) and pollen. Roots at 

all nodes of these transgenic plants express GUS. GUS is expressed over the entire length 

of the roots with the exception that in at least some roots, the expression drops dramatically 

at the distal end of the root. The loss of stainable activity in the root ends is not due to 

technological limitations of the protocol in that roots from transformation events expressing 
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transgenes driven by other promoters express highly in these regions. The-stem stains for . 
GUS activity non-uniformly, with the pith showing poor or no staining; the nodes and areas 
adjacent to the outer edge of the stem stain. Most of the areas that stain correspond to 
regions rich in vascular tissue. The blade, sheath and the midrib of the leaves express 
5 GUS. Kernels do not display any stainable activity in overnight incubations in GUS 
histochemical staining solution when the kernels are from crosses using the 
perSodft/CfUS/nos plants as the pollen donor. However, when the transgenic plant is used 
as the maternal parent in the cross, GUS is expressed in the pericarp (seed coat) as well as a 

discrete area of the embryo. 

I0 Expression patterns of maize plants transformed with pDAB419 were similar to the 

expression patterns observed in transgenic rice. The per 5 promoter/a^ / intron 
combination appear to promote a pattern of expression which is constitutive. That is, 
significant expression is observed in both roots and leaves. This is unexpected as the per 5 
gene is natively root-preferentially expressed. This result is consistent with the expression 

/ 5 pattern that was observed in rice. 

fx ample 1 1 
PLAITS 16 

PerGUS 16 is a plasmid containing 4kb of perS promoter, the/wrJ untranslated 
leader sequence, the coding sequence for the first five amino acids oiperS, the GUS gene, 
20 and the nos 3'UTR. The complete sequence of PerGUS 16 is given in SEQ ID NO 15. 

With reference to SEQ ID NO 15, significant features of PerGUS16 are given in Table 11. 

Table 11: Significant Features of PerGUS 16 



nt (SEQ ID 
NO 15) 


Features 


1-6 


Sstl site 


37-42 


BamHI site 


43-48 


Sail site 


48-53 


Ncol site 


48-4247 


PerS promoter nt 1-4200 of SEQ ID NO 1 and 
untranslated leader 


4248-4263 


PerS exon nt 4201-4215 of SEO ID NO 1 


4264-6068 


P glucuronidase gene (GUS) 


6069-6111 


untranslated sequence from pBI221 


6122-2127 


Sstl site 


6122-6396 


nos y UTR 


6397-6407 


linker 


6402-6407 


Hindlll site 


6408-9299 


1 Bluescript ® II SK 
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PerGUS16 is different from pDAB41 1 in that PerGUS 16 includes the coding 
sequence'for the first 5 amino acids of the per 5 protein. In addition PerGUS16 contains 4 
kB of upstream promoter sequence, whereas pDAB41 1 only contains 2 kB of sequence. 
Neither PerGUS 16 nor pDAB41 1 includes an intron in the untranslated leader. PerGUS16 
5 was constructed and tested in a transient maize root expression assay as follows. 

A. Construction of PerGUS 16 . A 4.0 kB Ncol fragment, containing 4 kB of 
upstream per5 sequence, the per5 untranslated leader sequence and the coding sequence for 
the first 5 amino acids of per5, from perGENl(10.4) was purified from a 1.0% agarose gel 
using Qiagen kit. This 4.0 kB promoter fragment was ligated into an Ncol site at the 
10 translation initation start site of the GUS gene in pGUS«asl2. pGUSwasl2 is a plasmid 
based on Bluescript ® II SKT with an inserted BamHl-Hindlll fragment containing the 
coding region for the GUS gene and the nos 3' UTR. The resultant translation fusion is 
PerGUS 16. 

B. Expression Assay . Results of testing PerGUS 1 6 in a transient maize 
15 root expression assay are given in Table 14. 

Example \% 
PERGUSPER3 

PERGUSPER3 is a plasmid containing 4kb of per 5 promoter, the per 5 untranslated 
leader sequence, the coding sequence for the first five amino acids of per5, the GUS gene, 
20 and the per 5 3' UTR. The complete sequence of PERGUSPER3 is given in SEQ ID NO 
16. With reference to SEQ ID NO 16, critical features of PERGUSPER3 are as follows: 



Table 12: Significant Features of PERGUSPER3 



nt (SEQ ID 
NO 16) 


Features 


1-6 


SstI site 


1-42 


Bluescript SK polylinker 


37-42 


BamHI site 


43-48 


Xbal site 


43-53 


synthetic linker 


54-59 


Ncol site 


54-4253 


Per5 promoter nt 1-4200 SEQ ID NO 1 


4254-4269 


Per 5 exon nt 4201-4215 SEQ ID NO 1 


4264-4269 


Ncol site 


4266-6074 


P glucuronidase gene (GUS) 


6075-6117 


untranslated sequence from pB1221 


6135-6140 


Xhol site 


6140-6510 


Per5 3' UTR nt 6069-6439 SEQ ID NO 1 


6511-6516 


Hindlll site 


6517-9408 


Bluescript ® II SK" ! 
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PERGUSPER3 is identical to PerGUS 16 except for its 3 1 UTR. PerGUS16 has the 
nos and PERGUSPER3 has the per5 3' UTR. Neither PERGUSPER3 nor PerGUS 16 has 
an intron in the untranslated leader. PERGUSPER3 was constructed and tested in a 
5 transient maize root assay, in stable transformed rice callus, and in stable transformed rice 
plants as follows. 

A. CaPStDlgtiflD of PHRGUSPER3 

1 . RSGTISper4 . The 3' UTR from the per5 gene was amplified on a 396 bp 
fragment (corresponding to bp 6069 to 6439 of SEQ ID NO 1 plus 26 bases of synthetic 
10 linker sequence) from the plasmid perGENl(10.4) using Amplitaq polymerase with buffers 
supplied and synthetic primers, 

tt a TPTrr, a n nnc A CTG A AGT f ^^TTG A TGTGCTG AATT (SEQ ID NO 17) and 
rzr.r.n a a nr TT PTPT A G A TTTGGA TATA TGCCGTQ A A ^ A ATTO (SEQ ID NO 18). 
The 5* primer added an Xhol restriction site, and the 3' primer included a HindW site, to 

/ 5 facilitate cloning. This fragment contains a canonical AAUAAA poly-A addition signal at 
position 247 (corresponding to bp 6306 of SEQ ID NO 1). The amplification product was 
ligated into an XhoVHindUl of plasmid pDAB356/X [Note: The structure of plasmid 
pDAB356/X is not directly relevant to the end result of this construction series. It was 
constructed during an unrelated series, and was chosen because it contained restriction 

20 recognition sites for Xhol and HindHl at the 3' end of the GUS coding region. Those 
skilled in the art will realize that other plasmids can be substituted at this step with 
equivalent results.] and transformed into DH5ot. Ampicillin resistant transformants were 
screened by colony hybridization using the per5 3' UTR amplification product as a probe. 

32 TT 

Three of the resulting transformants hybridized to P radiolabelled 3* UTR 
25 amplification product. The plasmid from each of these three transformants was extracted 
for sequence analysis. Sequence analysis using an Applied Biosystems automated 
sequencer revealed that a clone designated p3'per26 was free of PCR induced errors. A 2.0 
kB BamWHindlll fragment from p3'per26 containing the GVS-per5 3' UTR was gel 
purified as described above and ligated into the BamHl/Hindffl cloning site of Bluescript ® 
30 II SK". One of the resulting plasmids, designated BSGUSpeM, was characterized and 
selected for subcloning. 
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2. PEROT JSPER3. The 4.0 kB Ncol per5 promoter fragment from 
perGENl(10.4) described above was ligated into the Ncol site of BSGUSper4 (the 
translational initiation of the GUS gene). The resultant clone, PERGUSPER3, contains 4 
kB of per5 promoter, the per 5 untranslated leader sequence, the first 5 amino acids of per5, 
the GUS gene, and the per5 3' UTR. 

B. Expression Assays. Results of testing PERGUSPER3 in a transient maize root 
assay are given in Table 14. Results of testing PERGUSPER3 in stable 
transformed rice callus and rice plants is given in Tables 15. 

Example 13 

5' Deletions of EEEGUSBER3 
A series of 5' deletions of PERGUSPER3 was assembled to test the effect on 

expression. Construction of these vectors utilized naturally occurring restrictions sites in 

the 4.0 kB Ncol promoter region. 

A. Construction of SPGP1 

SPGP1 is identical toPERGUSPER3 except for the absence of 2 kB of 5' upstream 
sequence (i.e., bp 25 to 2585 of SEQ ID NO 16 are deleted). SPGP1 was derived from 
PERGUSPER3 by subcloning the Xbal fragment of PERGUSPER3 into the Xbal site of 
Bluescript ® II SK~ 

B. Construction of HSPGP4 . 

HSPGP4 is identical to SPGP1 except for the absence of 1 kB of 5' upstream 
sequence (i.e., bp 25 to 3240 of SEQ ID NO 16 are deleted). This vector was derived from 
SPSP1 by the deletion of the 1 kB Hindill fragment. 

C. Construction ofPSPGPl 

PSPGP1 is identical to SPGP1 except for the absence of 1.9 kB of PstI sequence 
(i.e., bp 25 to 4139 of SEQ ID NO 16 are deleted). PSPGP1 only had 109 bases of 5* 
sequence which includes the TATA box. 

D. Expression Assay. Results of testing SPGP1, HSPGP4 and PSPGP1 in a 
transient maize root expression assay are given in Table 14. 

Example 14 

Transient Root Expression Assay 

Transient assays have been successfully used for studying gene expression in plants, 

especially where an efficient stable transformation system is not available (ie., maize, 

wheat). In protoplasts, these assays have been used to study the expression of regulatory 
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elements with relatively simple expression patterns. For example, constitutive promoters, . 
including the CaMV 35S, have been extensively studied in maize protoplasts. Luehrsen 
and Walbot (1991). However, it was believed that a root preferential promoter, such as 
P er5, would be unlikely to function normally in protoplasts, particularly those derived from 
5 tissue culture. Therefore, a system to study expression in intact root tissue was desirable. 
Particle bombardment of root tissue would enable transient expression analysis and reduce 
the need for production of stable transgenics. 

A. mimLmmmJlMM^. Captan™-treated seed of CQ806 and OQ403 
were soaked for 45 min., rinsed 3 times in sterile distilled water, and germinated in sterile 
,0 petri dishes (100x25 mm) containing Whatman #1 filter paper moistened with sterile milli 
Q water for about 4-7 days. Approximately 1 cm size root tips were excised and arranged 
(6 per target) in 'blasting' medium (#4 with 2% agar). The 'blasting medium' consisted of 
N6 basal salts and vitamins (Chu, 1978), Fe-EDTA, 20 g/L sucrose, 690 mg/L L-proline, 
100 mg/L enzymatic casein hydrolysate (ECH), 1 mg/L 2,4-dichlorophenoxyacetic acid 
13 (2,4-D), and 20 g/L agar. The roots were covered with a 204 micron screen prior to 
blasting. Each target was blasted once at 1,500-2,000 psi using two times dilution of 
gold/DNA solution. The gold particles (Biorad 1 .0 micron) were coated with DNA 
(different plasmids as mentioned in the text) as described in Example 10B. Different 
blasting parameters, i.e., 1) different helium pressures (500, 1,000, 1,500, and 2,000 psi), 2) 
20 number of blastings per target (1-4 blastings per target), 3) concentration of gold/DNA (1-4 
times dilutions of gold/DNA solution), 4) particle size (Aldrich 1.5-3.0 micron vs. Biorad 
1 .0 micron gold particles), and 5) high osmoticum treatment (0.2M mannitol and 0.2M 
sorbitol treatment 4h prior to and 16-18 h after blasting) were tested. Following blasting, 
roots were transferred to 15AglO-2D medium and incubated in the dark at 27° C. The 
23 15AglO-2D medium differed from #4 medium in that it contained 2.9 g/L L-proline, 
lOmg/L silver nitrate, 2 mg/L 2,4-D, and 2.5 g/L Gelrite. 

B. ffisioitoicaj fiTIS Assay After 18-24 hrs , the blasted roots were assayed 
for transient GUS expression according to Jefferson (1987). Roots were placed in 24-well 
microtitre plates (Corning, New York, NY) containing 500 uL of assay buffer per well (six 
30 per well). The assay buffer consisted of 0.1 M sodium phosphate (pH 8.0), 0.5 mM 

potassium ferricyanide, 0.5 mM potassium ferrocyanide, 10 M sodium EDTA, 1 .9 mM 5- 
bromo-4-chloro-3-indolyl-beta-D-glucuronide, and 0.06% triton X-100. The plates were 
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incubated in the dark for 1-2 days at 37° C before observations of GUS expression under a . 
microscope. 

C. Optimization of DNA Delivery into Roots . Transient expression increased 
with increased helium pressure with highest levels observed at 1,500-2,000 psi. High 

5 osmoticum treatment prior to blasting did not enhance GUS expression. Also, increasing 
the number of blastings per target did not result in increased expression. One blasting per 
target yielded highest expression in roots of both OQ403 and CQ806. In addition, two 
times dilution of gold/DNA solution and use of the Biorad 1.0 micron particles were found 
to be most suited for obtaining consistently high levels of expression. Based on these 

JO results, a set of conditions were established for blasting into roots. With these conditions, 
60-100% of the blasted roots expressed GUS with an average number of ca. 50 GUS 
expression units per target using pDAB4 1 8 (Ub 1 -GUS-nos). 

D. Transient Expression of Different per5 Constructs in Roots. Transient GUS 
expression of different per 5 constructs was tested in roots following helium blasting using 

/ 5 the conditions described above. The results from ten different experiments are summarized 
in Table 14. 

TABLE 14. Transient expression of different /w?r5 constructs in roots. 



Plasmid Description #GEUs*(N)t Rating 



PerGUS16 


4.5 kB per 5, first 5 aa of per 5 protein-GUS-floy 


3.4 


(24) ++ 


PERGUSPER3 


4.5 kB per5, first 5 aa of perS protein-GUS-perJ 


10.0 


(24) ++++ 


SPGP1 


2.0 kB per5, first 5 aa of per5 protein-GUS-peri 


10.7 


(24) ++++ 


HSPGP 


1 .0 kB per 5, first 5 aa of per 5 protein-GUS-peri 


5.8 


(15) +++ 


PSPGP 


0.1 kB per5, first 5 aa of per 5 protein-GUS-per5 


10.8 


(16) ++++ 


pDAB411 


2.0 kB per5-G\)S-nos 


1.1 


(5) + 


pDAB419 


2.0 kB per5, Adhl intronl-GUS-floy 


6.7 


(3) +++ 



* GUS expression units (number of blue spots observed) per target 
JN=# of targets blasted 

pDAB41 1, the construct containing 2.0 kB per5, expressed at very low levels. With 



PerGUS16 containing 4.0 kB per 5 and a fusion including the first five amino acids of the 
30 per5 protein, the expression was 3-fold higher than that of pDAB41 1 . Further, 

PerGUSper3 consisting of per 5 with the 3 f UTR showed a further 3-fold increase over 
PerGUS16 demonstrating that 3* end is also important for regulation of expression. 
Although SPGP1 contained 2.0 kB of per5, no difference was observed between the 
expression of SPGP1 and PerGUSper3. With additional deletion in the 5' region of per 5 in 
35 HSPGP (which contains 1 .0 kB of per5 ), expression was decreased over that of SPGP1 
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and PerGUSperS. However, relatively high levels of expression were observed with 
PSPGP containing only 0.1 kB region of per5. 

Probably all of the promoter elements which were necessary for maximal root 
specific expression are present in the first 1 kB of 5' sequence. However, elements which 
5 may suppress expression in other tissues may not be present in this 1 kB sequence. Similar 
observations have been made with the 5' upstream sequences of the Sus4 gene from potato 
which contains a negative element that suppresses expression in stems and leaves. Fu et al. 
(1995). Transient assays in other tissues would be necessary to obtain this information 
from IheperS constructs. Expression from PSPGP, which contained only 100 bases 5' 
,0 sequence, probably acts as a basal promoter and, therefore, would not be expected to 
contain the elements necessary for root specific expression nor enhancer elements 
necessary for maximal activity of the promoter. Expression from this construct in stable 
plants would be expected to be constitutive. 

A translations fusion oitheperS gene which included theperJ 5' untranslated 
,5 leader (UTL) and the first 5 amino acids of thcperS gene fused to the uidA was included in 
PerGUS16, PERGUSPER3, SPGP1 , HSPGP, and PSPGP constructs. The ability of these 
constructs to express GUS, demonstrated that this UTL sequence was capable of promoting 
translation and therefore can be used to express commercially important transgenes. 

The most obvious improvement in expression was observed from the addition of the 
20 P er5 y UTR in place of the nos sequence. 3' UTR's are known to contain sequences which 
affect gene expression by altering message stability (Sullivan and Green (1993)) or 
influencing translation (Jackson and Standart (1990)). Examples include polyadenylation 
signals (Rothnie et al. (1994)) and destabilizing elements (Gallie et al. (1989)). However, 
the per5 and nos 3UTRs cannot be distinguished by the presence or absence of these 
25 sequences. Both UTR's contain a canonical AAUAAA poly-A addition signal. Neither 
sequence appears to contain any of the published destabilizing elements. An obvious 
difference between the two UTRs is the length; the longer/*** UTR may confer greater 
stability of the message. 
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Example 15 
Rice Transform ation of PERGT JSPER3 
Trans genic Production and Histochemical GUS Assay 

To study the expression of PerGUSPer3 in transgenic rice, a total of 35 independent 

5 transgenic lines were produced. Out of these, plants of 9 lines (354/PERGUSPER3- 

03,20,21,23,24,27,28,30,and 34) displayed GUS expression in roots. Although GUS 

expression was variable from line to line, a few lines showed very intense expression in 

roots. Histochemical GUS analysis of different tissues following vacuum infiltration 

showed GUS expression in cut portions of leaves, glumes, anthers, pollen and embryo. No 

10 expression was seen in endosperm. All of these results suggest that per5 expresses in a 

constitutive manner in rice. 

Rice plants from six PERGUSPER3 Ro lines were characterized by Southern 

analysis. The rice DNA was also cut with the restriction enzyme Xbal which should result 

in a 4.2 kb fragment when hybridized to the GUS probe. All of the six lines contain the 

15 gene construct. A moderately complex integration event was detected in one of the six lines 

containing an intact copy of the gene construct. The remaining five lines all had complex 

integration events with as many as nine hybridization products. A summary of the genetic 

analysis is located in Table 15. 



Table 15: Assay of Transformed Rice Plants 
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Both longitudinal and transverse root sections prepared from transgenic rice 
seedlings showed cells with GUS expression (blue color) and cells interpreted to lack GUS 
expression (red color resulting from the counterstain). Longitudinal section of a primary 
root showed GUS expression present in all cells except for those present in the root cap, 
25 meristematic zone, and a portion of the cell elongation zone. This pattern of expression 
was confirmed for secondary root formation in a transverse section of root tissue. Cross 
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section of a primary root, prepared from within the zones of cell elongation and 
differentiation, showed most cells expressing GUS. Very intense GUS expression (dark 
blue) was observed in the exodermis or outer cortex of the root sample. GUS expression 
was noted as slight to absent in the epidermal layer even though root hairs were observed 
5 macroscopically to be blue. Both vascular and cortical tissues showed moderate 
expression. Based on the consistent staining patterns obtained from free hand tissue 
sections, cells in the vascular and cortical tissues genuinely expressed the GUS protein 
rather than appear as artifacts with the diffusion of histochemical stain from the exodermis. 
Analysis of variance showed that sample to sample variation within each of the 
10 independent events was not significant. However, most of the variation was associated 
among the different events. Based on the GUS quantitative data, only event 
354/PERGUSPER3-20 was shown to be highly significant different (p<0.001) from zero 
(Table 15) even though five other events were shown to be histochemically GUS positive. 
The maize per5 5' region in combination with the 3' untranslated sequences 
75 promoted high-level expression of the introduced p-glucuronidase gene in young transgenic 
rice plants. Functional activity was observed in both roots and leaves. Quantitative data 
indicated that there was considerable variability of expression between the different events. 
This variability is most likely a result of a combination of factors including position effects 
of the integrated transgene, differences in copy number of the insertion products, and 
20 rearrangements of the insertion events. All of these variables have the potential to effect 
expression levels and have been documented in most transgenic studies. 

Despite high degree of variability in the expression levels, the expression pattern of 
PerGUSPer3 in different transformation events was consistent. Slight to very intense 
expression was evident in the entire primary and secondary roots except in the root tips. 
25 Histological analysis showed very intense expression in the outer cortex and moderate 
expression in cortex and vascular tissues. Such pattern and level of expression observed 
appears to be very suitable for expression of genes to control root pests (i.e., root weevil). 
In addition, consistent with expression in roots, high levels of expression was also observed 
in stem and leaf tissue (quantitative data) thus providing opportunity for controlling other 
30 insects (i.e., stem borer). These data demonstrate that IheperS promoter, in the absence of 
an intron, drives constitutive expression of transgenes in rice. 
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flxarpple 16 
MaiZS Transformation of PERGUSPER3 
Establishment of typell callus targets and helium blasting conditions were that same 

as described in Example 10. A total of 82 independent transgenic colonies of maize were 

produced. Of these, 55 lines were subjected to Southern analysis as described in Example 

15. Twenty-nine lines were found to be Southern positive and contained an intact 

hybridization product of the GUS gene. Following GUS histochemical assay, callus of 

about 72 lines showed no expression. Also, roots and leaves of different Southern-positive 

lines displayed no GUS expression when callus was regenerated on the 'regeneration' 

medium. This data supported the observation that sequences other than the 5' promoter 

region and the 3' UTR were critical for expression in com. 

Example 17 
Plasmid PIGP/367 

Plasmid PIGP/367 contains the per 5 promoter, the per 5 untranslated leader 
modified to include the per5 intron 1, the GUS gene, and the per5 3'UTR. The complete 
sequence for PIGP/367 is given in SEQ ID NO 19. With reference to SEQ ID NO 19, 
critical features of PIGP/367 are given in Table 1 6. 



Table 16: Significant Features of PIGP/367 



nt (SEQ ID 
NO 19) 


Features 


1-40 


synthetic polylinker 


41-75 


pCR™2.1 polylinker 


81-1741 


Per5 promoter nt 2532-4 192 SEQ ID NO 1 j 


1742-1747 


BglU/BamHl junction 


1748-1763 


Per 5 exonl nt 4410-4425 SEQ ID NO 1 


1764-2396 


Per5 intron nt 4426-5058 SEQ ID NO 1 


2397-2405 


Per5 exon2 nt 5059-5067 SEQ ID NO 1 


2406-2411 


Ncol site 


2408-4215 


P glucuronidase gene (GUS) 


4217-4264 


sequence from pB1221 


4280-4652 


Per5 3* UTR nt 6067-6439 SEQ ID NO 1 


4653-4869 


synthetic linker 


4870-5121 


CaMV DNA nt 7093-7344 


5122-5129 


linker 


5130-5476 


CaMV DNA nt 7093-7439 


5477-5496 


linker 


5497-5606 


synthetic MSV leader(MSV nt 167-186, 188-277) 


5608-5613 


BgWBcll junction 


5608-5698 


Adhl.S nt 119-209 


5699-5820 


AdhJ.S nt 555-672 plus 4 bases linker sequence 


5821-5827 


BarriRllBgfil junction 


5828-5864 


MSV nt 278-317 


5863-5868 


Ncol site 



-53- 



PCTAJS98/11921 

WO 98/56921 



10 



15 



20 



5865-6419 



6420-6699 



"phosphinothricin acetyl transferase gene (Basta™ resistance. 



selectable marker) 



nos 3' UTR 
pUC19 sequences 



6700-9335 pu^i y scqw uv.^ — 

Because intron flanking sequences (exon DNA) have been shown to oe important in 

the processing of the intron (Luehrsen and Walbot (1991)), 16 bases of flanking exon DNA 
were included the fusion within the perS untranslated leader. 

£ojls t ffl cIioji QiPiGEflSZ- The promoter from theperS gene was amplified 

using the forward primer 

Gnnnn-T-T---^ AA ^ ATATACATAGATAAAMCC ( SE Q roNO20) 

which introduces a BamHl (GGATCC) site 5' of the promoter to facilitate cloning. The 
reverse primer within the untranslated leader of theperS gene was 

GGG n TrTrrTTrnrxnT— mnMAhQAQAhQhQ (SEQ id no 21) and 

introduced a Bglll (AGATCT) restriction site 3'. Sequences homologous to the promoter 
are underlined. The primers were synthesized on a 394 DNA/RNA Synthesizer (Applied 
Biosystems, Foster City, CA). Amplification reactions were completed with the Expand™ 
Long Template PCR System (Boehringer Mannheim, Indianapolis, IN). Plasmid 
perGenl0.44, which contains 10.1 kb of the maize peroxidase gene and untranslated and 
non-transcribed sequences, was used as the template DNA. Amplifications were cycled 
with a 56°C annealing temperature. Amplification products were separated and visualized 
by 1.0% agarose gel electrophoresis. Resulting amplification products were excised from 
the agarose and the DNA was purified using Qiaex II (Qiagen, Hilden, Germany). The 
products were ligated into pCR2.1 using the Original TA Cloning Kit (Invitogen 
Corporation, San Diego, CA). Recombinant plasmids were selected on Luria agar (Gibco, 
Bethesda, MD) containing 75mg/liter ampicillin (Sigma, St Louis, MO) and 40 ml/plate of 
a 40mg/ml stock of X-gal (Boehringer Mannheim, Indianapolis, IN). Plasmid DNAs were 
purified using Wizard™ plus Miniprep DNA Purification System (Promega, Madison, WI). 
DNA was analyzed and subcloned with restriction endonucleases and T4 DNA ligase from 
Bethesda Research Laboratories (Bethesda, MD). The resultant^ promoter clone was 
named pl21-20. 

Intron 1 and 25 bases of flanking exon DNA from the per5 gene was amplified 
using the forward primer GGGGGATCC T G & rrfiTT TTOTr A AfiOTTT A ATTCTOCTT 
(SEQ ID NO 22) which introduced a BamUl (GGATCC) site 5' the exon/intron DNA, and 
30 the reverse primer, GGGCCATGG ATCGC AGC CCTA^ A Q AT QTA AC AGTGTTQT 
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(SEQ ID NO 23), which introduced an Ncol (CCATGG) site 3' to facilitate- fusion at the 
ATG start codon of the GUS gene. Sequences homologous to the per5 sequence are 
underlined. Amplification and cloning was completed as described above with the 
resultant intron clone named pi 22-2. The intron was then excised from pi 22-2 on the 

5 BamHl/Ncol fragment and introduced 5' to the GUS gene/per 5 3 1 untranslated region in 
BSGUSper4. Ligations were transformed into DH5a (Laboratory, Bethesda, MD) and 
DNA was extracted as described above. Sequence across the junction was verified using 
Dye Terminator Cycle Sequencing Ready Reaction Kit (Perkin Elmer, Foster City, CA) 
and 373A DNA Sequencer (Applied Biosystems, Foster City, CA). Computer analysis of 

W the sequences was facilitated by Sequencher™ 3.0 (Gene Codes Corporation, Ann Arbor, 
MI). The intermediate, p!28-l, was then digested with BamEl and ligated to the purified 
promoter BglWBamHI fragment from p 12 1-20. To generate a final construct containing 
the selectable marker gene for Basta™ resistance, the per5 promoter/per5 intron/GUS 
gene/per5 3' UTR were excised from PIPG147-2 on a PvulVNotl fragment and introduced 

15 into a PmellNotl site of pDAB367. pDAB367, which contains the gene for Basta™ 
resistance, is described in Example 27. The final construct was designated pPIGP/367. 

Example 18 
Transformation of Maize with nPIGP/367 

A. Establishment of Type TT Callus Targets. The materials and methods used were 
20 the same as in Example 10. 

B. Helium Blasting and Selection. The materials and methods used were the same 
as in Example 10. Thirty three Basta™ resistant lines, designated pPIGP-01 
thru pPIGP-33, were obtained. 

C. Plant Regeneration. The materials and methods used were the same as in 

25 Example 8. Plantlets were regenerated from five of the PIGP/367 transgenic 

lines (PIGP/367-01, PIGP/367-06, PIGP/367- 1 9, PIGP/367-32 and PIGP/367- 
33). 

D. GUS histoc hemical staining. Tissue from plantlets of pPIGP-01 were 
histochemically evaluated as described in Example 10. The plantlets showed good GUS 

30 expression in the roots except for the root cap where no expression was observed. No 
expression was observed in the leaves of these young plants. 
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E a atg jfl E^teSlioJ iod msasyiemgDl "f GUS. Leaf and root tissue was collected . 
and analysis for GUS expression completed from four of the PIGP/367 transgenic lines 
(PIGP/367-06, PIGP/367-19, PIGP/367-32 and PIGP/367-33) which showed positive GUS 
histochemical expression. An untransformed plant at the same stage of development, 
5 CS405, served as a negative control. The 6th leaf and cleaned roots (roots were cleaned 
under cold running tap water and rinsed with distilled water) were collected from 4-5 R, 
plants plants within transgenic lines. The samples were either stored at - 70° C or 
powdered using liquid nitrogen. Fifty mL tubes, chilled on dry ice, were filled to 10 mL 
mark with powdered samples. Protein from each sample was extracted in duplicate. Four 
10 volumes/weight of extraction buffer (Extraction buffer is 1% polyvinylpolypyrrolidone 
(hydrated in the solution for at least one hour), 20% glycerol, 0.7 uL/mL p- 
mercaptoethanol, 50 mM NaPO; pH 7.0, 10 mM EDTA, 0.1% Triton X-100, 0.1% 
sarcosyl, 10 mM p-mercaptoethanol) was added to each sample. Samples were ground 
using Ultra-Turrax T 25 (IKA-Works INC, Staufen I. Br., W. Germany) and kept on ice. 
15 Samples were spun at 3000 rpm at 4° C for five minutes. Ten uL/mL of protease inhibitor 
(50 ug/mL antipain, 50 ug/mL leupeptin, 0.1 mM chymostain, 5 ug/mL pepstatin, 0.24 
Ug/mL pefabloc (Boehringer Mannheim, Indianapolis, IN)) was added to withdrawn 
sample supernatant. The samples were then spun at 4° C for 10 minutes at 13,000 rpm. 
The supernatants were withdrawn and stored at - 70°C. Protein concentration was 
20 measured on a UV-Visible Spectrophotometer (Shimadzu, Kyoto, Japan). Five uL of 

sample was added to 2.5 mL of protein dye reagent (Sigma Diagnostics, St. Louis, Mo) and 
100 uL of sterile water. A range of standards was made from protein standard solution 
(Sigma Diagnostics, St. Louis, Mo). 

GUS activity was measured using a GUS-Light ™ Kit (Tropix Inc., Bedford, MA) 
25 in replicate samples of the duplicate extractions. Five uL samples of undiluted extract or of 
extract diluted so that the luminescence was within the range measured by the luminometer 
was added to 195 uL of the GUS-™ Diluent Solution. After 1 hr incubation, at 28° C in 
the dark, luminescence was measured using a Bio Orbit 1251 luminometer, equipped with a 
Bio Orbit 1291 injector, after injection of 300 uL of GUS-Light ™ Accelerator. 
30 Luminescence was integrated for 5 sec after a 5 sec delay. The standards used were 
extraction buffer, non-transformed tissue stock and GUS-Light ™ Gus Standard. The 
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results are summarized in Table 17 and showed high levels of expression in the roots, but 
low to no significant expression in the leaves. 

Table 17: Expression of GUS with PIGP/367 in Plants from Four Transformation Events 





Lear 


Koot 




(RLU/ng 


(RLU/Hg 


Line 


protein) 


protein) 


PIGP/367-06 


734 


5735 


PIGP/367-19 


49 


5745 


PIGP/367-32 


8 


349 


PIGP/367-33 


72 


1586 


CS405 


1 


13 



5 O. Summary of Repression Results. In the previous examples herein, no 

significant expression was observed in any maize tissue (although it was in rice) in the 
absence of an intron downstream from the per 5 promoter. When the Adhl intron was fused 
to the promoter (Examples 8, 10), expression in maize was observed. The Adhl intron I 
was not capable of restoring the root-preferential expression in maize that is characteristic 

10 of the native per 5 gene. Root-preferential expression was only achieved when the 

promoter was placed in combination with the per5 intron. This is the first demonstration of 
an intron directing tissue specific or tissue-preferential expression in transgenic plants. Xu 
et al (1994) have reported preliminary studies on the promoter of another root-preferential 
gene, the triosephosphate isomerase gene from rice. They found that an intron is required 

15 for expression from this promoter in rice protoplasts, but the effects of the intron on gene 
expression in mature tissues has not been described. 

The mechanism for enhancement by an intron is not well understood. The effect 
appears to be post-transcriptional (rather than promoter-like effects on the initiation of 
transcription) because the enhancements are only seen when the intron is present in the 

20 region of DNA that is transcribed (Callis, 1987). Introns could play a role in stabilizing the 

pre-mRNA in the nucleus, or in directing subsequent processing (Luehrsen and Walbot, 

1991). The root-preferential expression of the per5 promoter-intron combination could be 

explained by requiring an intron for processing, and a limited tissue distribution of other 

factor(s) necessary for correct processing. 

25 Example 19 

Plasmid pi 88-1 

Plasmid pl88-l is a clone of the perS 3TJTR. The per5 3' UTR was amplified on 
Plasmid Xba4, which contains the 4.1 kb Xbal fragment from nt 2532 to 6438 of SEQ ID 
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NO 1, using the forward primer, AAA GAG CTC TO A QQQ CAC TGA f>rGT CGC TTG 
ATG TGC (SEP ID NO 24), which introduced a SstI site on the 5' end, and the reverse 
primer, GGG GAA TTC rrr, oat ATA TGC COT GA A CAA TTG TTA TGT TAC 
(SEQ ID NO 25), which introduced an EcoKl site on the 3' end of a 366 bp segment of 

5 perS V UTR (corresponding to nt 6066 to 6431 of SEQ ID NO 1). Sequences homologous 
to the promoter are underlined. The primers were synthesized on a 394 DNA/RNA 
Synthesizer (Applied Biosystems, Foster City, CA). Amplification reactions were 
completed with the Expand™ Long Template PCR System (Boehringer Mannheim, 
Indianapolis, IN). Plasmid Xba amplifications were cycled with a 56°C annealing 

w temperature. Amplification products were separated and visualized by 1 .0% agarose gel 
electrophoresis. Resulting amplification products were excised from the agarose and the 
DNA was purified using Qiaex II (Qiagen, Hilden, Germany). The products were ligated 
into P CR2.1 from the Original TA Cloning Kit (Invitrogen Corporation, San Diego, CA). 
Recombinant plasmids were selected on Luria agar (Gibco, Bethesda, MD) 

15 containing 75mg/liter ampicillin (Sigma, St Louis, MO) and 40 ml/plate of a 40mg/ml 
stock of X-gal (Boehringer Mannheim, Indianapolis, IN). Plasmid DNAs were purified 
using Wizard™ plus Miniprep DNA Purification System (Promega, Madison, WI). DNA 
was analyzed and subcloned with restriction endonucleases and T4 DNA ligase from 
Bethesda Research Laboratories (Bethesda, MD). The resultant per5 3TJTR clone was 

20 named pi 88-1. 

Example 2Q 
pTGP190-l 

Plasmid pTGP190-l is a 5887 bp plasmid comprising a gene cassette in which the 
following components are operably joined: the 35T promoter, the GUS gene, and the per5 
25 3'UTR. The complete sequence of pTGP190-l is given in SEQ ID NO 26. With reference 
to SEQ ID NO 26, important features of pTGP 190-1 include: 
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Table 18: Significant Features of pTGP 190-1- 



nt (SEQ ID 
NO 26) 


Features 


12-17 


Pst\ site 


18-30 


linker 


31-282 


CaMV MCASTRAS nt 7093-7344 


283-290 


linker 


291-637 


CaMV DNA MCASTRAS 7093-7439 


638-657 


linker 


650-655 


BarnYil site 


651-1024 


374 bp BamHl/Ncol fragment containing MSV leader and 
Adhl intron 


658-677 


MSVnt 167-186 


678-767 


MSV nt 188-277 


769-774 


BgllUBcll junction 


769-978 


Adhl.S intron with deletion described in Example 24 


979-988 


linker 


982-987 


BamHVBglll junction 


989-1028 


MSVnt 278-317 


1024-1029 


Ncol site 


1026-2834 


p glucuronidase coding sequence (GUSy 


2835-2890 


sequence from pKA882 


2890-2895 


Sstl site 


2896-3261 


Per 5 3*UTR nt 6066 to 643 1 of SEQ ID NO 1 


3262-3267 


EcoRL site 


3268-5897 


pUC19 sequences 



Construction of pTGPl 90-1 . The per5 3' UTR was excised from pl88-l (Example 
19) using the SstVEcdRI sites and purified from an agarose gel as described above. This 

5 fragment was ligated to the SstVEcoRI A fragment of pDAB305. (pDAB305 is described 
in detail in Example 24.) Plasmid pDAB305 is a 5800 bp plasmid that contains a 
heterologous promoter which is known as 35T. Construction of the 35T promoter is 
described in detail in Example 24. Basically this construct contains tandem copies of the 
Cauliflower Mosaic Virus 35S promoter (35S), a deleted version of the Adhl intron 1, and 

10 the untranslated leader from the Maize Streak Mosaic Virus (MSV) Coat Protein fused to 
the p-glucuronidase gene, which is then followed by the nos 3'UTR.) The SstVEcoRI A 
fragment of pDAB305 deletes the nos 3*UTR. Ligations were transformed into DH5a 
(Bethesda Research Laboratory, Bethesda, MD) and DNA was extracted as described 
above. Sequence across the promoter/GUS junction was verified using Dye Terminator 

15 Cycle Sequencing Ready Reaction Kit (Perkin Elmer, Foster City, CA) and 373A DNA 
Sequencer (Applied Biosystems, Foster City, CA). Computer analysis of the sequences 
was facilitated by Sequencher™ 3.0 (Gene Codes Corporation, Ann Arbor, MI). Plasmid 
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pTGP190-l is identical to pDAB305 except for the substitution of the perS 3'UTR for the 
nos 3'UTR following the GUS gene. 

Example 21 
TIGP232-4 

Plasmid UGP232-4 is similar to pTGP 190-1, but contains the ubiquitin 1 (ubi) 

promoter and intron I from maize in place of the 35T promoter. The ubi promoter was 

excised on a HindllVNcol fragment from pDAB1538 (described in Example 29) and 

ligated t6 the HindlWNcol A fragment of pTGP190-l to derive UGP232-4. The complete 

sequence for UGP232-4 is given in SEQ ID NO 27. With reference to SEQ ID NO 27, 

important features of UGP232-4 are given in Table 19. 

Table 19: Significant Features of UGP232-4 



20 



25 



nt (SEQ ID 
NO 27) 


Features 


1-5 


Hindlll site 


1-14 


pUC 19 polylinker 


15-993 


ubiquitin promoter from maize 


994-2007 


ubiquitin intron . 


2008-2026 


Synthetic polylinker from previous constructs (Kpnl, Smal and Sail) 


2025-2030 


Ncol site _ 


2027-3835 


P glucuronidase coding sequence (GUS) 


3836-3890 


sequence from pKA882 


3891-3896 


Ssil site , . - 


3897-4262 


Per 5 V UTR nt 6066 to 6431 of SEQ ID NO 1 


4263-4268 


£coRI site 


4269-6898 


pUC 19 sequence 



pUGN81-3 was used as the IJbiquitin/GUS/nas control plasmid. 

Example 22 

Quantitative T ransient Assays of Maize Callus 

n^hzrAM with P TGP191-1 or UGP232-4 

A Proration o f n N A fnr transient testing. Each of the test constructs, in addition 
to pDAB305 (described in Example 24), was co-precipitated onto gold particles with 
pDeLux (described in Example 26) according to the following protocol. Equal molar 
amounts of the GUS constructs were used. A total of 140 ug of DNA, 70 ug of pDeLux 
plus 70 ug of test DNA and Bluescript ® II SKT DNA (when necessary), was diluted in 
sterile water to a volume of 300 uL. The DNA and water were added to 60 mg of surface- 
sterilized 1.0 um spherical gold particles (Bio-Rad Laboratories, Hercules, CA). The 
mixture was vortexed briefly (approximately 15 seconds) before adding 74 uL of 2.5 M 
calcium chloride and 30 uL of 0.1 M spermidine (free base). After vortexing for 30 
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seconds, the DNA and gold were allowed to precipitate from solution. The supernatant 
was removed and 1 mL of ethanol was added. The DNA/gold mixture was diluted 1:8 
before use for transformation. 

R. Transient testing in maize callus. Regenerable (Type II) maize callus was 
5 pretreated on osmotic medium (N6 salts and vitamins (Chu (1978)), 1 mg/L 2,4- 

dichlorophenoxyacetic acid, 0.2 M sorbitol, 0.2 M mannitol, 7 g/L Gelrite, pH 5.8) for 
approximately 16 hours. Afterward, it was placed onto 60 x 20 mm plates of osmotic 
medium solidified with 2% agar for helium blasting. Cages of 104 \im mesh screen 
covered each "target" (500-600 mg of callus) to prevent splattering and loss of tissue. 

10 Targets were individually blasted with DNA/gold mixture using the helium blasting device 
described in Example 10. Under a vacuum of 650 mm Hg, at a shooting distance of 10 cm 
and pressure of 1 500 psi, DNA/gold mixture was accelerated toward each target four times, 
delivering 20 jiL per shot. The targets were rotated 180° after each blast. The tissue was 
also mixed halfway through the blasting procedure to expose unblasted callus. Upon 

15 completion of blasting, the targets were again placed onto the original osmotic medium for 
overnight incubation at 26°C in the dark. 

Four Type II callus cell lines were selected for each experiment. Two targets from 
each line were used per treatment group. Also, two nontransformed controls (NTC) were 
included within each experiment, composed of tissue pooled from all four lines. These 

20 controls were transferred to osmotic and blasting media according to the protocol above, 
but were not subjected to helium blasting. 

C. GUS quantitative analysis. Approximately 20 hours after blasting, 200-400 mg 
of each target was transferred to a 1.5 mL sample tube (Kontes, Vineland, NJ). For 
extraction of proteins, callus was homogenized using a stainless steel Kontes Pellet Pestle 

25 powered by a .35 amp, 40 Watt motor (Model 102, Rae Corporation, McHenry, IL), at a 
setting of "90". Cell Culture Lysis Reagent from a Luciferase Assay kit (Promega, 
Madison, WI) served as the extraction buffer. Protease inhibitors, phenylmethylsulfonyl 
fluoride (PMSF) and leupeptin hemisulfate salt, were added to the lysis buffer at the 
concentrations of 1 mM and 50 ^M, respectively. Before grinding, 0.5 ^L of lysis buffer 

30 per mg tissue was added to the sample tube. The callus was homogenized in four 25- 
second intervals with a 10-second incubation on ice following each period of grinding. 
Afterward, 1.0 |iL of lysis buffer per mg tissue was added to the sample which was 
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maintained on ice until all sample grinding was completed. The samples were then 
centrifuged twice at 5°C for 7 minutes at full speed (Eppendorf Centrifuge Model 5415). 
After the first spin, the supernatant from each tube was removed and the pellet was 
discarded. Callus extracts (supernatants) were also collected after the second spin and 
5 maintained on ice for GUS and Luciferase (LUC) analyses. 

From the LUC Assay kit, LUC Assay Buffer was prepared according to the 
manufacturer's instructions by reconstituting lyophilized luciferin substrate. This buffer 
was warmed to room temperature and loaded into the dispensing pump of an automatic 
luminescence photometer (Model 1251 Luminometer and Model 1291 Dispenser, Bio- 
,0 Orbit, Finland). Each sample was tested in triplicate by adding 20 uL of extract to three 
polypropylene luminometer vials (Wallac, Gaithersburg, MD). Per vial, 100 uL of assay 
buffer was dispensed, and luminescence was detected over a 45-second integration period. 
"Blank reactions", including 20 uL of extraction buffer rather than callus extract, were also 
measured within each experiment to determine the extent of background readings of the 
15 luminometer. 

For analysis of GUS activity, a GUS-Light™ assay kit (Tropix, Bedford, MA) was 
used. Again, each sample was tested in triplicate, using 20 uL of extract per luminometer 
vial. GUS-Light™ Reaction Buffer was prepared from the assay kit by diluting liquid 
Glucuron™ substrate according to the manufacturer's instructions. This buffer was 
20 warmed to room temperature and added in 180 uL aliquots to each luminometer vial at 7- 
second intervals. After a one hour incubation at room temperature, 300 uL of GUS- 
Light™ Light Emission Accelerator Buffer was added and luminescence was detected over 
a 5-second integration period. "Blank reactions" were also included in the GUS assay, 
using 20 uL of extraction buffer rather than callus extract. 
25 GUS and LUC results were reported in relative light units (RLU). Both "blank" and 

NTC readings were subtracted from sample RLU levels. For comparison of one construct 
to another, GUS readings were normalized to LUC data by calculating GUS/LUC ratios for 
each sample tested. The ratios for all samples within a treatment group were then averaged 
and the means were subjected to a T-test for determination of statistical significance. 
30 Within each experiment, results were reported as a percent of pDAB305 expression. 

Transient bombardment of Type II callus for each of the constructs was completed 
as described above. By including pDAB305 as a standard in each experiment and 
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reporting results as a percent of the standard, data from numerous experiments could be 
meaningfully compared. Table 20. lists results from three experiments testing the nos 
versus the per5 3'UTRs using two promoters. With either the 35T or Ubil promoter, the 
per5 3'UTR resulted in higher transient GUS expression than the nos 3' end constructs. 

5 pUGN223-3 is a plasmid that contains a fusion of the maize ubiquitin promoter and 

ubiquitin intron 1 to the GUS gene similar to pUGP232-4. However, pUGN223-3 has the 
nos 5 3'UTR instead of the per 3'UTR. pUGN223-3 was used as a control to directly 
compare expression relative to the 3'UTRs of per5 and nos in combination with the maize 
ubiquitin 1 (Ubil) promoter and intron 1 . 

10 Table 20. Summary of transient GUS expression for all of the constructs tested. 



Construct 


GUS/LUC Ratio (% of pDAB305) 


pDAB305 (35T/GUS/#i<w) (control) 


*100 


pTGP 190-1 (35T/GUS/per5) 


*114 


pUGN223-3 (Ubi/GUS/*os) (control) 


fl37 


pUGP232-4 (Ubi/GUS/per5) 


f!63 



* not significantly different (p=0.05) 
tsignificantly different (p=0.05) 

Transient analysis indicated that the per5 3' UTR functioned as well as nos when 

the GUS gene was driven by the 35T promoter and 1 9% better than nos when driven by the 

15 maize Ubiquitin 1 promoter. The reason for this increased efficiency is not known, but it 

could result from changes in the efficiency of processing or increased stability of the 

message. 

Example 23 

Comparison of GUS Exp ression in Transformed Rice for Per5 V UTR and nos 3 J 

20 UTR Constructs 

This example measures quantitative GUS expression levels obtained when the 3' 
UTR is used as a polyadenylation regulatory sequence, UGP232-4, in transgenic rice 
plants. In this example the GUS gene is driven by the maize ubiquitinl (Ubil) promoter. 
Expression levels are compared with the nos 3' UTR sequence and the same promoter 
25 (Ubi 1)/GUS fusion, pDAB 1518 (described in Example 28). 

A. Transgenic Production. As described in Example 9. 
1 . Plasmids. The plasmid UGP232-4, containing the GUS gene driven by the 
maize ubiquitinl promoter and the Per5 3' UTR was described in Example 21 . The 
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plasmid pDAB354, which carries a gene for hygromycin resistance, was described in 
Example 25. 

2. Rice Transformation. Production of transgenic rice plants was described in 
Example 9. 

B Ejcp ressioj] Analysis. Analysis of GUS expression and Southern analysis 
techniques were described in Example 9. These results are summarized in Table 21 for 30 
independent transgenic events recovered with UGP232-4 and 8 independent events from 
the control plasmid, pDAB1518 (described in Example 28). 

Table 21: GUS Expression in Transformed Rice Plants For PER5 and NOS 3' UTR 



Constructs 



Transgenic Event 



354/UGP-45 

354/UGP-36 

354/UGP-39 

354/UGP-40 

354/UGP-02 

354/UGP-03 

354/UGP-04 

354/UGP-10 

354/UGP-37 

354/UGP-34 

354/UGP-48 

354/UGP-29 

354/UGP-28 

354/UGP-19 

354/UGP-31 

354/UGP-50 

354/UGP-44 

354/UGP-35 

354/UGP-17 

354/UGP-27 

354/UGP-38 

354/UGP-22 

354/UGP-12 

354/UGP-42 

354/UGP-ll 

354/UGP-26 

354/UGP-25 

354/UGP-18 

354/UGP-06 

354/UGP-24 

1518-03 
1518-08 
1518-09 
1518-24 
1518-23 
1518-07 
1518-10 



GUS Activity (RLU 
Root 



/ ug protein) 

Leaf 



349,310 
326,896 
152,961 
126,027 
58,359 
54,509 
54,501 
53,222 
45,288 
43,226 
37,284 
35,630 
32,177 
29,646 
29,520 
11,320 
9,301 
7,113 
4,590 
3,367 
1,567 
1,202 
903 
670 
378 
160 
152 
77 
69 
43 

278,286 
140,952 
97,769 
84,844 
47,734 
2,406 
2,188 



295,012 
172,316 
127,619 
106,275 
21,720 
20,758 
20,838 
26,514 
90,428 
7,180 
28,029 
14,631 
16,317 
13,143 
19,774 
9,752 
9,556 
2,062 
3,350 
975 
258 
1,229 
15 
780 
96 
80 
340 
26 
95 
26 

108,075 
42,867 
83,209 
45,807 
62,279 
3,146 
1,759 



Presence of 
Intact Construct 



YES 

YES 

YES 

YES 

YES 

YES 

YES 

YES 

YES 

NO* 

YES 

NO* 

YES 

NO* 

YES 

YES 

NO* 

YES 

YES 

YES 

YES 

YES 

YES 

NO* 

YES 

YES 

YES 

YES 

YES 

YES 

n.d. 
n.d. 
n.d. 
n.d. 
n.d. 
n.d. 
n.d. 
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44 52 n.d. 

*- The expected 3.9 kb fragment was not obtained but instead a range of 2 to 4 other 

hybridization bands were noted. 

n.d. = not determined 

For both constructs there was a great deal of variability of GUS expression 
5 observed in both roots and leaves. Although a few events displayed higher GUS 

expression with the UGP construct, overall the expression levels using the per 5 3' UTR 
were comparable to that of the nos 3' UTR. Southern analysis of plants from the 30 
UGP232-4 events verified a corresponding 3.9 kb fragment to the GUS probe for the 
majority of events. Overall, the per 5 3' UTR demonstrates the ability to augment 

10 expression as good, or better than the nos 3' UTR. The per 5 3' UTR has also been used to 
express the GUS reporter gene in stably transformed maize (Examples 16). Therefore, this 
sequence has broad utility as a 3 f UTR for expression of transgenic products in monocots, 
and probably in dicots. 

Various combinations of the regulatory sequences from the Per 5 gene have proven 

75 to have utility in driving the expression of transgenic products in multiple crops. Table 22 
summarizes the transient and stable expression patterns observed from each of the 
constructs tested in maize and the stable expression patterns observed in rice. These data 
demonstrate the ability of any of the per 5 promoter iterations to drive transgene expression. 
An unexpected finding was that introns significantly affect tissue specificity of transgene 

20 expression in stably transformed maize plants, but do not similarly affect expression in rice. 
In stably transformed maize plants the Adhl intron supported expression in all tissues, 
whereas the per5 intron supported a tissue preferential pattern of expression. Finally, the 
per 5 3' UTR was capable of supporting transgenic expression when used in combination 
with the per5 promoter or other heterologous promoters in maize or rice. 

25 Table 22. Summary of GUS expression patterns observed from various per5 elements. 



Promoter 


Intron 


3'UTR 


Transient (root) 


Stable Maize 


Stable Rice 


per5 




nos 


positive (low) 


negative 


n.d. 


per5 




per5 


positive 


negative 


constitutive 


per5 


adhl 


nos 


positive 


constitutive 


constitutive 


per5 


per5 


per5 


n.d. 


root specific 


n.d. 


35T 


adhl 


per5 


positive 


n.d. 


n.d. 


ubi 


ubi 


nos 


positive (high) 


n.d. 


constitutive 


ubi 


ubi 


per5 


positivie (high) 


n.d. 


constitutive 


n.d 


.= not detc 


^rmined 
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Example 24 
pDAB 305 

Plasmid pDAB305 is a 5800 bp plasmid that harbors a promoter containing tandem 
copy of the Cauliflower Mosaic Virus 35S enhancer (35S), a deleted version of the Adhl 
5 intron 1, and the untranslated leader from the Maize Streak Mosaic Virus Coat Protein 
fused to the p-glucuronidase gene, which is then followed by the nos 3'UTR. 

A. rnnstnintion pj a rinnh1v-er.hanr.eri CaMV 35S Promoter, 
This section describes molecular manipulations which result in a duplication of the 
expression-enhancer element of a plant promoter. This duplication has been shown (Kay et 
10 al (1987)) to result in increased expression in tobacco plants of marker genes whose 

expression is controlled by such a modified promoter. [Note: The sequences referred to in 
this discussion are derived from the Cabb S strain of Cauliflower Mosaic Virus (CaMV). 
They are available as the MCASTRAS sequence of GenBank, which is published. (Franck 
et al, 1980). All of the DNA sequences are given in the conventional 5" to 3' direction. 
15 The starting material is plasmid pUC13/35S(-343) as described by Odell et al. (1985). This 
plasmid comprises, starting at the 3 1 end of the Smal site of pUC13 (Messing(1983)) and 
reading on the strand contiguous to the noncoding strand of the lacZ gene of pUC13, 
nucleotides 6495 to 6972 of CaMV, followed by the linker sequence CATCGATG (which 
contains a CM recognition site), followed by CaMV nucleotides 7089 to 7443, followed by 
20 the linker sequence CAAGCTTG, the latter sequence comprising the recognition sequence 
for HindSl, which is then followed by the remainder of the pUC13 plasmid DNA. 

1 pUC13/35S(-343) DNA was digested with CM and Ncol, the 3429 base 
pair (bp) large fragment was separated from the 66 bp small fragment by agarose gel 
electrophoresis, and then purified by standard methods. 
25 2 . pUC13/35S(-343) DNA was digested with CM, and the protruding ends 

were made flush by treatment with T4 DNA polymerase. The blunt-ended DNA was the 
ligated to synthetic oligonucleotide linkers having the sequence CCCATGGG, which 
includes an Ncol recognition site. The ligation reaction was transformed into competent 
Escherichia coli cells, and a transformant was identified that contained a plasmid (named 
30 pOO#l) that had an Ncol site positioned at the former CM site. DNA of pOO#l was 

digested with Ncol and the compatible ends of the large fragment were religated, resulting 
in the deletion of 70 bp from pOO#l, to generate intermediate plasmid pOO#l NcoA. 
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3. pOO#l NcoADNA was digested with EcoKV 9 and the blunt ends were 
ligated to Clal linkers having the sequence CATCGATG. An E. coli transformant 
harboring a plasmid having a new Clal site at the position of the previous EcoKV site was 
identified, and the plasmid was named pOO#l NcoA RV>Cla. 
5 4. DNA of pOO#l NcoA RV>Cla DNA was digested with Clal and Ncol, 

and the small (268 bp) fragment was purified from an agarose gel. This fragment was then 
ligated to the 3429 bp CM/Ncol fragment of pUC13/35S(-343) prepared above in step 1, 
and an E. coli transformant that harbored a plasmid having ClallNcol fragments 3429 and 
268 bp was identified. This plasmid was named pUC13/35S En. 

W 5. pUC13/35S En DNA was digested with Ncol, and the protruding ends 

were made blunt by treatment with T4 DNA polymerase. The treated DNA was then cut 
with Smal, and was ligated to BglU linkers having the sequence CAGATCTG. An E. coli 
transformant that harbored a plasmid in which the 416 bp SmaUNcol fragment had been 
replaced with at least two copies of the Bglll linkers was identified, and named p35S En 2 . 

15 [NOTE: The tandomization of these BgRl linkers generate, besides Bglll recognition sites, 
also Pstl recognition sites, CTGCAG]. 

The DNA structure of p35s En 2 is as follows: Beginning with the 
nucleotide that follows the third C residue of the Smal site on the strand contiguous to the 
noncoding strand of the lacZ gene of pUC13; the linker sequence 

20 CAGATCTGCAGATCTGCATGGGCGATG (SEQ ID NO 28), followed by CaMV 

nucleotides 7090 to 7344, followed by the Clal linker sequence CATCGATG, followed by 
CaMV nucleotides 7089 to 7443, followed by the Hindlll linker sequence CAAGCTT, 
followed by the rest of pUC13 sequence. This structure has the feature that the enhancer 
sequences of the CaMV 35 S promoter, which lie in the region upstream of the EcoKV site 

25 in the viral genome (nts 7090 to 7344), have been duplicated. This promoter construct 
incorporates the native 35S transcription start site, which lies 1 1 nucleotides upstream of 
the first A residue of the Hindlll site. 

B. Plasmids utilizing the 35S promoter and the A ?rohaQte r\uw no$ Poly A 
sequences. 

30 The starting material for the first construct is plasmid pBI221 , purchased from 

CLONTECH (Palo Alto, CA). This plasmid contains a slightly modified copy of the 
CaMV 35S promoter, as described in Bevan et al (1985), Baulcombe et al (1986), 
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Jefferson et al, (1986) and Jefferson (1987). Beginning at the 3' end of the Pst I site of . 
pUC19 (Yanisch-Perron et o/.(1985)) and reading on the same strand as that which encodes 
the lacZ gene of pUC19, the sequence is comprised of the linker nucleotides GTCCCC, 
followed by CaMV nucleotides 6605 to 7439 (as described in 24A), followed by the linker 

5 sequence GGGGACTCTAGAGI2AXCCCCGGGTGGTCAGTCCCTT (SEQ ID NO 29), 
wherein the underlined bases represent the BamEl recognition sequence. These bases are 
then followed by 1809 bp comprising the coding sequence of the E. coli uidA gene, which 
encodes the p-glucuronidase (GUS) protein, and 55 bp of 3' flanking bases that are derived 
from the E. coli genome (Jefferson, 1986), followed by the Sad linker sequence GAGCTC, 

10 which is then followed by the linker sequence GAATTTCCCC (SEQ ID NO 30). These 
bases are followed by the RNA transcription termination/polyadenylation signal sequences 
derived from the Agrobacterium tumefaciens nopaline synthase (nos) gene, and comprise 
the 256 bp SaulA I fragment corresponding to nucleotides 1298 to 1554 of DePicker et al. 
(1982), followed by two C residues, the EcoKL recognition sequence GAATTC, and the 

15 rest of pUC19. 

1. pBI221 DNA was digested with EcoBl and Sa/wHI, and the 3507 bp fragment 
was purified from an agarose gel. pRAJ275 (CLONTECH, Jefferson, 1987) DNA was 
digested with EcoKl and Sail, and the 1862 bp fragment was purified from an agarose gel. 
These two fragments were mixed together, and complementary synthetic oligonucleotides 

20 having the sequence GATCCGGATCCG (SEQ ID NO 31) and TCGACGGATCCG (SEQ 
ID NO 32) were added. [These oligonucleotides when annealed have protruding single- 
stranded ends compatible with the protruding ends generated by BamHl and Sail.] The 
fragments were ligated together, and an E.coli transformant harboring a plasmid having the 
appropriate DNA structure was identified by restriction enzyme analysis. DNA of this 

25 plasmid, named pKA881, was digested with Ball and EccXl, and the 4148 bp fragment was 
isolated from an agarose gel. DNA pBI221 was similarly digested, and the 1517 bp 
EcomiBaa fragment was gel purified and ligated to the above pKA881 fragment, to 

generate plasmid pKA882. 

2. pKA882 DNA was digested with Sad, the protruding ends were made blunt by 
30 treatment with T4 DNA polymerase, and the fragment was ligated to synthetic BamHl 

linkers having the sequence CGGATCCG. An E.coli transformant that harbored a plasmid 
having BamW fragments of 3784 and 1885 bp was identified and named pKA882B. 
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3. pKA882B DNA was digested with BamHl, and the mixture of fragments was 
ligated. An E.coli transformant that harbored a plasmid that generated a single 3783 bp 
fragment upon digestion with BamHl was identified and named p35S/nos. This plasmid 
has the essential DNA structure of pBI221, except that the coding sequences of the GUS 
5 gene have been deleted. Therefore, CaMV nucleotides 6605 to 7439 are followed by the 
linker sequence GGGGAC TCTAGAGGATCC CGAATTTCCCC (SEP ID NO 33), where 
the single underlined bases represent an Xbal site, and the double underlined bases 
represent a BamHl site. The linker sequence is then followed by the nos Polyadenylation 
sequences and the rest of pBI221. 

JO 4. p35S/«os DNA was digested with EcoRV and Pstl, and the 3037 bp fragment 

was purified and ligated to the 534 bp fragment obtained from digestion of p35S En 2 DNA 
with EcoRV and Pstl. An E. coli transformant was identified that harbored a plasmid that 
generated fragments of 3031 and 534 bp upon digestion with EcoRV and Pstl, and the 
plasmid was named p35S En 2 /nos. This plasmid contains the duplicated 35S promoter 

15 enhancer region described for p35S En 2 in Example 24A Step 5, the promoter sequences 
being separated from the nos polyadenylation sequences by linker sequences that include 
unique Xbal and BamHl sites. 

C. Construction of a synthetic untranslated leader. 

This example describes the molecular manipulations used to construct a DNA 

20 fragment that includes sequences which comprise the 5' untranslated leader portion of the 
major rightward transcript of the Maize Streak Virus (MSV) genome. The MSV genomic 
sequence was published by Mullineaux et al, (1984), and Howell (1984), and the transcript 
was described by Fenoll et al (1988). The entire sequence, comprising 154 bp, was 
constructed in three stages (A, B, and C) by assembling blocks of synthetic 

25 oligonucleotides. 

1. The A Block: Complementary oligonucleotides having the sequence 
GATCCAGCTGAAGGCTCGACAAGGCAGATCCACGGAGGAGCTGATATTTGGTG 

GAC A (SEQ ID NO 34) and 

AGCTTGTCCACCAAATATCAGCTCCTCCGTGGATCTGCCTTGTCCAGCCTTCAG 
30 CTG (SEQ ED NO 35) were synthesized and purified by standard procedures. Annealing of 
these nucleotides into double-stranded structures leaves 4-base single stranded protruding 
ends [hereinafter referred to as "sticky ends"] that are compatible with those generated by 
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Bamm on one end of the molecule (GATC), and with ffwdm-generated single stranded . 

ends on the other end of the molecule (AGCT). Such annealed molecules were ligated into 

plasmid Bluescript ® n SK~ that had been digested with Bamm and Hindm. The 

sequence of these oligonucleotides is such that, when ligated onto the respective Bamm 

and Hindm sticky ends, the sequences of the respective recognition sites are maintained. 

An E. coli transformant harboring a plasmid containing the oligonucleotide sequence was 

identified by restriction enzyme analysis, and the plasmid was named pMSV A. 

2. The B Block: Complementary oligonucleotides having the sequences 

AGCTGTGGATAGGAGCAACCCTATCCCTAATATACC 

AGCACCACCAAGTCAGGGCAATCCCiiGG. (SEQ ID NO 36) and 
TCGACCCGGGATTGCCCTGACTTGGTGGTGCTGGTATATTAGGGATAGGGTTGC 

TCCTATCCAC (SEQ ID NO 37) were synthesized and purified by standard procedures. 
The underlined bases represent the recognition sequence for restriction enzymes Smal and 
Xmal. Annealing of these nucleotides into double-stranded structures leaves 4-base sticky 
ends that are compatible with those generated by Hindm on one end of the molecule 
(AGCT), and with Sa/I-generated sticky ends on the other end of the molecule (TCGA). 
The sequence of these oligonucleotides is such that, when ligated onto the Hindm. sticky 
ends, the recognition sequence for Hindm is destroyed. 

DNA of pMSV A was digested with Hindm and Sail, and was ligated to the 
above annealed oligonucleotides. An E. coli transfonmant harboring a plasmid containing 
the new oligonucleotides was identified by restriction enzyme site mapping, and was 

named pMSVAB. 

3. The C Block: Complementary oligonucleotides having the sequences 

CCGGGCCATTTGTTCCAGGCACGGGATAAGCATTCAG CrATGGGATATCAAGC 
TTGCiATCC C (SEQ ID NO 38) and 

T ^ T i qqqi xrr a a nrnr, ATATrr.C ATGGCTGAATGCTTATCCCGTGCCTGGAA 
CAAATGGC (SEQ ID NO 39) were synthesized and purified by standard procedures. The 
oligonucleotides incorporate bases that comprise recognition sites (underlined) for Ncol 
(CCATGG), £coRV (GATATC), Hindm (AAGCTT), and BamRl (GGATCC). Annealing 
of these nucleotides into double-stranded structures leaves 4-base sticky ends that are 
compatible with those generated by Xmal on one end of the molecule (CCGG), and with 
^I-generated sticky ends on the other end of the molecule (TCGA). Such annealed 
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molecules were ligated into pMSV AB DNA that had been digested with Xmal zn&Xhol. 
An E.coli transformant harboring a plasmid containing the oligonucleotide sequence was 
identified by restriction enzyme analysis, and DNA structure was verified by sequence 
analysis. The plasmid was named pMSV CPL; it contains the A, B and C blocks of 
5 nucleotides in sequential order ABC. Together, these comprise the 5' untranslated leader 
sequence ("L") of the MSV coat protein ("CP") gene. These correspond to nucleotides 167 
to 1 86, and 1 88 to 3 1 7 of the MSV sequence of Mullineaux et al, (1984), and are flanked 
on the 5' end of the BamHl linker sequence GGATCCAG, and on the 3' end by the linker 
sequence GATATCAAGCTTGGATCCC (SEQ ID NO 40). [Note: An A residue 
10 corresponding to base 187 of the wild type MSV sequence was inadvertently deleted during 
cloning.] 

4. Bglll Site Insertion: pMSV CPL DNA was digested at the Smal site 
corresponding to base 277 of the MSV genomic sequence, and the DNA was ligated to 
Bglll linkers having the sequence CAGATCTG. An E.coli transformant harboring a 
15 plasmid having a unique Bglll site at the position of the former Sma I site was identified 
and verified by DNA sequence analysis, and the plasmid was named pCPL-Bgl. 

D. Construction of a deleted version of the maize alcohol dehydrogenase 1 
(Adhl) intron! 

The starting material is plasmid pVWl 19, which was obtained from V. Walbot, 
20 Stanford University, Stanford, CA. This plasmid contains the DNA sequence of the maize 
AdhLS gene, including intron 1, from nucleotides 1 19 to 672 [numbering of Dennis et al 
(1984)], and was described in Callis et al (1987). In pVWl 19, the sequence following 
base 672 of Dennis et al (1984) is GAC GGATCC where the underlined bases represent a 
BamHl recognition site. The entire intron 1 sequence, with 14 bases of exon 1, and 9 bases 
25 of exon 2, can be obtained from this plasmid on a 556 bp fragment following digestion with 
Bell and BamHl. 

1 . Plasmid pSG3525a(Pst) DNA was digested with BamHl and Bell , and the 3430 
bp fragment was purified from an agarose gel. [NOTE: The structure of plasmid 
pSG3525a(Pst) is not directly relevant to the end result of this construction series. It was 
30 constructed during an unrelated series, and was chosen because it contained restriction 

recognition sites for both Bell and BamHl, and lacks Hindlll and Stul sites. Those skilled 
in the art will realize that other plasmids can be substituted at this step with equivalent 
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results.] DNA of plasmid pVWl 19 was digested with BamRl and Bell , and the gel 
purified fragment of 546 bp was ligated to the 3430 bp fragment. An E.coli transformant 
was identified that harbored a plasmid that generated fragments of 3430 and 546 upon 
digestion with BamRl and Bell . This plasmid was named pSG AdhAl. 

5 2. DNA of pSG AdhAl was digested with HiruWl, [which cuts between bases 209 

and 210 of the Dennis et al, (1984) sequence, bottom strand], and with Stul, which cuts 
between bases 554 and 555. The ends were made flush by T4 DNA polymerase treatment, 
and then ligated. An E.coli transformant that harbored a plasmid lacking Hindlll and Stul 
sites was identified, and the DNA structure was verified by sequence analysis. The plasmid 

W was named pSG AdhAl A. In this construct, 344 bp of DNA have been deleted from the 
interior of the intron 1. The loss ofthese bases does not affect splicing of this intron. The 
functional intron sequences are obtained on a 213 bp fragment following digestion with 
Bell and BamRl. 

3. DNA of plasmid pCPL-Bgl (Example 24C Step 4), was digested with BglR, and 
,5 the linearized DNA was ligated to the 213 bp Bell /BamRl fragment containing the deleted 
version of the Adhl.S intron sequences from pSG AdhAl A. [Note: The sticky ends 
generated by digestion of DNA with BgKl, Bell , and BamRl are compatible, but ligation of 
the BamRl or Bell sticky ends onto ones generated by BgRl creates a sequence not cleaved 
by any ofthese three enzymes.] An E.coli transformant was identified by restriction 
20 enzyme site mapping that harbored a plasmid that contained the intron sequences ligated 
into the Bglll site, in the orientation such that the Bglll/Bell juncture was nearest the 5' end 
of the MS V CPL leader sequence, and the BglW BamRl juncture was nearest the 3' end of 
the CPL. This orientation was confirmed by DNA sequence analysis. The plasmid was 
named pCPL A1I1 A. The MSV leader/intron sequences can be obtained from this plasmid 
25 by digestion with BamRl and Ncol, and purification of the 373 bp fragment. 

f Construction of plant ex n ^^nn vectors haseri on the enhanced 35S 
P^ow.r the MSV CPL ™H the deleted vision of the ddhl infron 1 

1. DNA of plasmid p35S EnW was digested with BamRl, and the 3562 bp linear 
fragment was ligated to a 171 bp fragment prepared from pMSV CPL DNA digested with 
30 BamRl. This fragment contains the entire MSV CPL sequence described in Example 7C. 
An E.coli transformant was identified by restriction enzyme site mapping that harbored a 
plasmid that contained these sequences in an orientation such that the Ncol site was 
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positioned near the nos Poly A sequences. This plasmid was named p35S~En 2 CPL/nos. It - 
contains the enhanced version of the 35S promoter directly contiguous to the MSV leader 
sequences, such that the derived transcript will include the MSV sequences in its 5' 
untranslated portion. 

5 2. DNA of plasmid pKA882 (see Example 24B Step 1) was digested with HindUl 

and Ncol, and the large 4778 bp fragment was ligated to an 802 bp HindllVNcol fragment 
containing the enhanced 35S promoter sequences and MSV leader sequences from p35S 
En 2 C?h/nos. An E.coli transformant harboring a plasmid that contained fragments of 
4778 and 802 bp following digestion with Hindlll and Ncol was identified, and named 

10 pDAB3 1 0. In this plasmid, the enhanced version of the 35S promoter is used to control 
expression of the GUS gene. The 5' untranslated leader portion of the transcript contains 
the leader sequence of the MSV coat protein gene. 

3. DNA of plasmid pDAB310 was digested with Ncol and Sac I. The large 3717 
bp fragment was purified from an agarose gel and ligated to complementary synthetic 

15 oligonucleotides having the sequences CGGTACCTCGAGTTAAC (SEQ ID NO 41) and 
CATGGTTAACTCGAGGTACCGAGCT (SEQ ID NO 42). These oligonucleotides, 
when annealed into double stranded structures, generate molecules having sticky ends 
compatible with those left by Sad, on one end of the molecule, and with Ncol on the other 
end of the molecule. In addition to restoring the sequences of the recognition sites for these 

20 two enzymes, new sites are formed for the enzymes Kpnl (GGTACC), Xhol (CTCGAG), 
and Hpal (GTTAAC). An E. coli transformant was identified that harbored a plasmid that 
contained sites for these enzymes, and the DNA structure was verified by sequence 
analysis. This plasmid was named pDABl 148. 

4. DNA of plasmid pDAB 1 148 was digested with BamRl and Ncol, the large 3577 
25 bp fragment was purified from an agarose gel and ligated to a 373 bp fragment purified 

from pCPL A1I1_ (Example 24D Step 3) following digestion with BamHl and Ncol. An 
E.coli transformant was identified that harbored a plasmid with Bam&l and Ncol, and the 
plasmid was named pDAB303. This plasmid has the following DNA structure: beginning 
with the base after the final G residue of the Pstl site of pUC19 (base 435), and reading on 
30 the strand contiguous to the coding strand of the lacZ gene, the linker sequence 

ATCTGCATGGGTG (SEQ ID NO 43), nucleotides 7093 to 7344 of CaMV DNA, the 
linker sequence CATCGATG, nucleotides 7093 to 7439 of CaMV, the linker sequence 
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GGGGACTCTAGAGGATCCAG (SEQ ID NO 44), nucleotides 167 to 186 of MSV, 
nucleotides 188 to 277 of MSV, a C residue followed by nucleotides 119 to 209 of AdhLS, 
nucleotides 555 to 672 of maize AdhLS, the linker sequence GACGGATCTG, nucleotides 
278 to 317 of MSV, the polylinker sequence 
5 GTTAACTCGAGGTACCGAGCTCGAATTTCCCC (SEQ ID NO 45) containing 

recognition sites for Hpal, Xhol, Kpnl, and Sad, nucleotides 1298 to 1554 of nos, and a G 
residue followed by the rest of the pUC19 sequence (including the EcdSl site). It is 
noteworthy that the junction between nucleotide 317 of MSV and the long polylinker 
sequence creates an Ncol recognition site. 
W 5. DNA of plasmid pDAB303 was digested with Ncol and Sad, and the 3939 bp 

fragment was ligated to the 1 866 bp fragment containing the GUS coding region prepared 
from similarly digested DNA of pKA882. The appropriate plasmid was identified by 
restriction enzyme site mapping, and was named P DAB305. This plasmid has the 
enhanced promoter, MSV leader and Adhl intron arrangement of pDAB303, positioned to 
1 5 control expression of the GUS gene. 

Example 25 
PlagmiH pDAB354 

All procedures were by standard methods as taken from Maniatis et al, (1982). 
Step 1 : Plasmid pIC19R (Marsh et al, (1984) was digested to completion with 
20 restriction enzyme Sad, the enzyme was inactivated by heat treatment, and the plasmid 
DNA was ligated on ice overnight with an 80-fold excess of nonphosphorylated 
oligonucleotide linker having the sequence 5' GAGTTCAGGCTTTTTCATAGCT 3' (SEQ 
ID NO 46), where AGCT is complementary to the overhanging ends generated by Sad 
digestion. The linker-tailed DNA was then cut to completion with enzyme Hin<W, the 
25 enzyme was inactivated, and the DNA precipitated with ethanol. 

Step 2: Plasmid pLG62 contains a 3.2 Kb Sail fragment that includes the 
hygromycin B phosphotransferase (resistance) gene as set forth in Gritz and Davies (1983). 
One microgram of these fragments was isolated from an agarose gel and digested to 
completion with restriction enzyme Hph I to generate fragments of 1257 bp. The enzyme 
30 was inactivated, and the 3' ends of the DNA fragments were resected by treatment with T4 
DNA polymerase at 37° for 30 min in the absence of added deoxynucleotide triphosphates. 
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Step 3: Following inactivation of the polymerase and ethanol precipitation 
of the DNA, the fragments prepared in Step 2 were mixed in Nick Translation Salts 
(Maniatis et ai, 1982) with the linker-tailed vector prepared in Step 1, heated 5 min at 65°, 
and slowly cooled to 37°. The non-annealed ends were made blunt and single-stranded 
5 regions filled in by treatment with the Klenow fragment of Escherichia coli DNA 

polymerase by incubation at 37° for 45 min, and then the mixture was ligated overnight at 
15°. Following transformation into E. coli MCI 061 cells and plating on LB agar with 50 
jig each of ampicillin and hygromycin B, an isolate was identified that contained a plasmid 
which generated appropriately-sized fragments when digested with EcoRl, Pstl, or Hindi. 

10 DNA sequence determination of a portion of this plasmid (pHYGl) revealed the sequence 
5 ? AGATCTCGTGAGATAAIQAAAAAG 3\ (SEQ ID NO 47) where the underlined 
ATG represents the start codon of the hygromycin B resistance gene, and AGATCT is the 
i?g7/f recognition sequence. In pHYGl, downstream of the hygromycin B resistance 
coding region, are about 100 bases of undetermined sequence that were deleted in the next 

15 step. 

Step 4: DNA of plasmid pHYGl was digested to completion with restriction 
enzyme BamHI, and the linear fragment thus produced was partially digested with Seal. 
Fragments of 3644 bp were isolated from an agarose gel and ligated to phosphorylated, 
annealed complementary oligonucleotides having the sequences: 
20 5' 

ACTCGCCGATAGTGGAAACCGACGCCCCAGCACTCGTCCGAGGGCAAAGGAAI 
AGTAAGAGCTCGG 3' (SEQ ID NO 48), and 
5' GATCCCGAGCTCTTACTATTCCTTTGCC 

CTCGGACGAGTGCTGGGGCGTCGGTTTCCACTATCGGCGAGT 3' (SEQ ID NO 
25 49). When annealed, these oligonucleotides have a protruding 4-base overhang on one end 
that is complementary to that generated by BamHI. Following transformation of the 
ligation mixture into E. coli DH5a cells and selection on LB media containing 50 jig/ml of 
ampicillin, a transformant was identified that contained a plasmid which generated 
expected fragments when digested with BamHI, BglYL, EcoRl, or Sad. This plasmid was 
30 named pHYGl 3'A. The sequence of this plasmid downstream from the stop codon of the 
hygromycin B resistance coding region (underlined TAG in above sequence; Gritz and 
Davies, 1983) encodes the recognition sequence for Sad. 

-75- 



WO 98/56921 PCT/US98/11921 

Step 5. DNA of plasmid pDAB309 was digested to completion with 
restriction enzyme Bsml, and the ends were made blunt by treatment with T4 DNA 
polymerase. Plasmid pDAB309 has the same basic structure as pDAB305 described 
elsewhere herein, except that a kanamycin resistance (NPTII) coding region is substituted 
for the GUS coding region present in pDAB305. This DNA was then ligated to 
phosphorylated, annealed oligonucleotide Bghl linkers having the sequence 5' 
CAGATCTG 3'. A transformed colony of DH5a cells harboring a plasmid that generated 
appropriately-sized fragments following BgUl digestion was identified. This plasmid was 
named pDAB309(Bg). DNA of plasmid pDAB309(Bg) was cut to completion with Sacl, 
and the linearized fragments were partially digested with Bglll. Fragments of 3938 bp 
(having ends generated by Bglll and Sacl) were isolated from an agarose gel. 

Step 6. DNA of plasmid pHYGl 3'A was digested to completion with Bglll 
and Sacl. The 1043 bp fragments were isolated from an agarose gel and ligated to the 3938 
bp BglUJSacl fragments of pDAB309(Bg) prepared above. After transformation into E. 
coli DH5a cells and selection on ampicillin, a transformant was identified that harbored a 
plasmid which generated the appropriately-sized restriction fragments with Bglll plus Sacl, 
Pstl, or EcoBl. This plasmid was named pDAB354. Expression of the hygromycin B 
resistance coding region is placed under the control of essentially the same elements as the 
GUS coding region in pDAB305. 

F.yample 26 
Plasmid pDeLux 

Production of the GUS protein from genes controlled by different promoter 
versions was often compared relative to an internal control gene that produced firefly 
luciferase. DeWet et al (1987). A plasmid (pT3/T7-l LUC) containing the luciferase 
(LUC) coding region was purchased from CLONTECH (Palo Alto, CA), and the coding 
region was modified at its 5' and 3' ends by standard methods. Briefly, the sequences 
surrounding the translational start (ATG) codon were modified to include an Afcol site 
(CCATGG) and an alanine codon (GCA) at the second position. At the 3' end, an Ssp I 
recognition site positioned 42 bp downstream of the Stop codon of the luciferase coding 
) region was made blunt ended with T4 DNA polymerase, and ligated to synthetic 

oligonucleotide linkers encoding the Bglll recognition sequence. These modifications 
permit the isolation of the intact luciferase coding region on a 1702 bp fragment following 
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digestion by Ncol and Bglil. This fragment was used to replace the GUS gene of plasmid 

pDAB305 (see Example 24E, step 5), such that the luciferase coding region was expressed 

from the enhanced 35S promoter, resulting in plasmid pDeLux. The 5 1 untranslated leader 

of the primary transcript includes the modified MSV leader/Adh intron sequence. 

5 Example 27 

Plasmid pDAJB367 

Plasmid pDAB367 has the following DNA structure: beginning with the base after the final C 
residue of the Sphl site of pUC 19 (base 441), and reading on the strand contiguous to the 
LacZ gene coding strand, the linker sequence 

10 CTGCAGGCCGGCCTTAATTAAGCGGCCGCGTTTAAACGCCCGGGCATTTAAATGG 
CGCGCCGCGATCGCTTGCAGATCTGCATGGGTG (SEQ ID NO 50), nucleotides 7093 
to 7344 of CaMV DNA (Frank et al. (1980)), the linker sequence CATCGATG, nucleotides 
167 to 186 of MSV (Mullineaux et al. (1984)), nucleotides 188 to 277 of MSV (Mullineaux et 
al. (1984)), a C residue followed by nucleotides 1 19 to 209 of maize Adh IS containing parts 

15 of exon 1 and intron 1 (Denis et al. (1984)), nucleotides 555 to 672 containing parts of Adh IS 
intron 1 and exon 2 (Denis et al. (1984)), the linker sequence GACGGATCTG (SEQ ID NO 
51), and nucleotides 278 to 317 of MSV. This is followed by a modified BAR coding region 
from pIJ4104 (White et al. (1990)) having the AGC serine codon in the second position 
replaced by a GCC alanine codon, and nucleotide 546 of the coding region changed from G to 

20 A to eliminate a Bglil site. Next the linker sequence TGAGATCTGAGCTCGAATTTCCCC 
(SEQ ID NO 52) , nucleotides 1298 to 1554 of nos (DePicker et al. (1982)), and a G residue 
followed by the rest of the pUC19 sequence (including the EcoRl site.). 

Example 28 
Plasmid pDAB 151 8 

25 pDAB 1 5 1 8 has the following DNA structure: the sequence CCGCGG, bases -899 

to +1093 of the maize ubiquitin 1 (Ubil) promoter and Ubil intron 1 described by 
Christensen et al. (1992), a polylinker consisting of the sequence 
GGTACCCCCGGGGTCGACCATGG (SEQ ID NO: 53) (containing restriction sites for 
Kpnl, Smal, San, and Ncol, with the Ncol site containing the translational fusion ATG), 

30 bases 306-2 1 53 of the p-glucuronidase gene from pRAJ220 described by Jefferson et al. 
(1986), the sequence GGGAATTGGAGCTCGAATTTCCCC (SEQ ID NO: 54), bases 
1298 to 1554 of nos (Depicker et al. (1982)), and the sequence GGGAAATTAAGCTT 
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(SEQ ID NO: 55), followed by pUC18 (Yanisch-Perron et al, 1985) sequence from base . 
398 to base 399 (reading on the strand opposite to the strand contiguous to the LacZ gene 
coding strand). 

F.xample 29 
PlasmidpDAB1538 

pDAB1538 has the following DNA structure: the sequence AGCGGCCGCATTCCCGG 
GAAGCTTGCATGCCTGCAGAGATCCGGTACCCGGGGATCCTCTAGAGTCGAC 

(SEQ ID NO: 56), bases -899 to +1093 of the maize ubiquitin 1 (Ubil) promoter and Ubil 
intron 1 described by Christensen et al. (1992), a polylinker consisting of the sequence 
GGTACCCCCGGGGTCGACCATGGTTAACTCGAGGTACCGAGCTCGAATTTCCCC 

(SEQ ID NO: 57), bases 1298 to 1554 of nos (Depicker et al. (1982)), and the sequence 
GGGAATTGGTTTAAACGCGGCCGCTT (SEQ ID NO:58), followed by pUC19 (Yanisch- 
Perron et al, 1985) sequence starting at base 400 and ending at base 448 (reading on the 
strand opposite to the strand contiguous to the LacZ gene coding strand). The Ncol site in the 
Ubil sequence beginning at base 143 was replaced by the sequence CCATGCATGG (SEQ ID 
NO:59). 
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(i) APPLICANT: Ainley, Michael 

Armstrong, Katherine 
Belmar, Scott 
Folkerts, Otto 
Hopkins, Nicole 
Menke, Michael A. 
Pareddy, Dayakar 
Petolino, Joseph F. 
Smith, Kelley 
Woosley, Aaron 

(ii) TITLE OF INVENTION: Regulatory Sequences for Transgenic Plants 
(iii) NUMBER OF SEQUENCES: 27 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: DowElanco Patent Department 

(B) STREET: 9330 Zionsville Road 

(C) CITY: Indianapolis 

(D) STATE: Indiana 

(E) COUNTRY: USA 

(F) ZIP: 46268 

( v ) COMPUTER READABLE FORM : 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.3 0 
(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER : 

(B) FILING DATE: 

(C) CLASSIFICATION: 
(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Stuart, Donald R 
(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 317 337 4816 

(B) TELEFAX: 317 337 4847 
(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 6550 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 
(ix) FEATURE: . 

(A) NAME/ KEY: exon 

(B) LOCATION: 4201.. 4425 

(D) OTHER INFORMATION: /product= "Peroxidase" 

(ix) FEATURE: 

(A) NAME /KEY : intron 

(B) LOCATION: 4426.. 5058 
(ix) FEATURE: 

(A) NAME/ KEY : exon 

(B) LOCATION: 5059.. 5250 
(ix) FEATURE: 

(A) NAME /KEY : intron 

(B) LOCATION: 5251.. 5382 
(ix) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 5383.. 5548 
(ix) FEATURE: 

(A) NAME /KEY : intron 

(B) LOCATION: 5549.-5649 
(ix) FEATURE: 

(A) NAME /KEY : exon 

(B) LOCATION: 5650.. 6065 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

■(B) LOCATION: j oin (4201 . . 442 5 , 5059.. 5250, 5383.. 5547, 5649 
. .6068) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 
CCATGGCCAG TTGCCGGTGG AGCAGGTAAA AACACCGTAG CGTAGCAGCC AGGCGGAAGC 60 
AGACGCACAG CACAGGTTGG TTATGATAGT CAGCCGGGCC ACATGTGTGT AGTTGGTACA 120 
CTGATACGCT TACACTGTCT CTCCTTTCTT TTTTATTTGT CACCTTTGGT CGAGCTTACA 180 
TAATTGTGTG ACTAAAAAAA GGTCACTTCA TTCAGAAATT TAGGGTTGTG GGAATTTTGG 240 
ATTTTATTGT GTCTGTATAG AGTAGCTATA GCTAGCTAGC TAGATGTGAT GTTAATAATT 300 
ATGACGATGA GATTGGCCCG CTTGGCCGCT TGCATTGTCT CCCTAGCTCA ATAATGTTTT 360 
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GAGTTTGTCT 


TGCCTTTCTT 


TCAGCTCTAA 


CAAATTGGAG 


TAGGGATQAC 


TGAGATACAT 


420 


ATATAAAAGC 


GAAAACCGCT 


GCTCTCTGTT 


AATTATTGCA 


CATCACACAT 


AGGCCAAGCC 


480 


TTAAGGACAA 


TCAACTAAGG 


ATGGTAATAA 


CTAAGGCTAG 


TGAGGTCGAA 


CTAGGGATGT 


540 


TAATATACTC 


TAGATTTTAG 


ACTATAAAAT 


TTAAGGATCG 


AATCAGATTA 


GTATCGAACT 


600 


ATATTTATAT 


TCATTTCTAA 


ACTAAATTAA 


TTAAGCACCC 


TAAATTATTG 


TGATGAAGAG 


660 


ACATTTCGAT 


CGTGATCCAT 


TATTACTCCT 


TGGTCAAACT 


AATCTCGTTT 


TATGTCACTA 


720 


TTTCATCATC 


TTTTTTGCGA 


ACGGGTTTAT 


AGCCCGTGTT 


CCATTATGAG 


GACATGAACG 


780 


GTTTAAACAA 

%J ******* \w» a*** 


AGTTACATAT 


CATCCCAGCT 


AGCTACCTAG 


ATTGGAAGCA 


TGGGTTCGGT 


840 


ATATATATAT 


AGTTTATATA 


TTTGGTATAT 


ATATATATAT 


ATATATATAT 


ATATATATAT 


900 


CACACGTCAG 


CTTATATTAC 


GTAAAGTGGG 


GTTAGTTTTC 


AAGAAGCGTG 


GGACCAGTCA 


960 


CCTCTGCAGT 


CTGACCTTGG 


CTTCAGCTTC 


GACAGCAAAC 


AGTCATCTCT 


TGGAAGCTAA 


1020 


GGACAGTCTC 


CAACAGTCAA 


CAAAGCAGCG 


GTCTGCTTGT 


AGTTCTCCCT 


TGCACGACCA 


1080 


GCTATATCTA 

V \# x ** X **■ X W X 


GCATCATAAC 


AACGGTAAGA 


TCATCTCTAG 


CACGACAAAC 


TTAGTTTAAT 


1140 


TAATTATGTC 


TAATCCGTTG 


TTGTTAGCTT 


AAACTTTCTA 


GCCTCCTATG 


CTAAGAGAGT 


1200 


TCTCTAGTTC 


TACTCAGGTG 


GATTGATATA 


TAAATTGGGA 


ATCTTCTAGG 


CGTCACAAGG 


1260 


TATGGTACAC 


ATCAATCAAT 


GAACGGACAA 


AGCAACGGTA 


AGATCCGACC 


CAGTAAAAGT 


1320 


AATAGCGTTA 


GGGCATGTAC 


AACCTAGACA 


CTGATGCACA 


GTACTCCAAG 


TATAAGACAC 


1380 


AACTAAAACA 


CAACATAATA 


ATACAGTGGT 


TATATCTAAA 


ACATGTGTCT 


TACCATATTC 


1440 


ATTGTACCAA 


TTAGAACATT 


TAATAAATTA 


AAGTGACCAA 


TCAGCTAGCC 


TCCTGTCTCG 


1500 


AACATAGAGC 


TAAGACATTG 


TGTCTTCGTC 


AAGATACATG 


TCTTAAGTTT 


TTTTATATTC 


1560 


ACTCCCAAAG 


ACACACTCTA 


AGACACAACG 


TAACACACCC 


ATTGTACATG 


CTCTTAACCT 


1620 


AAG TTATCAT 


GGATGACCAC 


GCGTGGCAAT 


TAAAAAAATA 


ATTTTTGCCT 


CCTAAAACCT 


1680 


CTTTCTTAAT 


TGGTTCTTGC 


TTGCAAATCA 


CCAGCGAACC 


CATATGAAAG 


GATGCTCAAA 


1740 


ATCTGGCCAC 


CG CAT CAGGG 


TTGGTGAATG 


CAACGTAAAA 


AAT AATG CAT 


AAATCAGCTC 


1800 


TCTGATCAGT 


TATATAATCG 


TGCCTTTTAA 


TTATTCATGC 


CAGCTTTATC 


TGACTCACGA 


1860 


AATCATTGAT 

X V**^- X X Wii X 


AAATT ATT C C 


TCAGCTGTAT 


TAGAAAGAGC 


AGTGTTGTTT 


AACTTGGAAA 


1920 


GTGATGTGGA 


AGCGTGTGAT 


TGCGGTTGAG 


CTTGTATAGG 


AGTAAAATGA 


GGAACAGTAG 


1980 


GAAAATAATT 


TTTTCGGATT 


AAAACCGGTT 


GTTTGGACTG 


CGGCAGATAC 


AATTCATAGA 


2040 


GATAAAAACA 


CCGTAGAAGT 


ATTAGAAGCC 


GATAAAGATT 


AAACCCAAAT 


GAACGAACAG 


2100 


GCTAAACAAA 


TCCGGCGCCT 


CAAAAGTCAA 


GAGCAGGTAC 


TGGGCTGTCT 


TGCACACGTC 


2160 


GCTTTTTGTC 


TCCCCCTGGC 


CCCTGGGTGA 


GAGTAGTAGG 


GATGCTAAAG 


TTTGCTTTCT 


2220 


CTTTTTGAGG 


CATGTGATAG 


GCTCTTGTTA 


GTTGCTAGGG 


CTATGTTTAT 


AATATTTGCG 


2280 


CTTTTACCTA 


TGTACGTAAG 


AACCGGATGG 


AATAATGCTA 


TGCAGGAACC 


AATTATGTTT 


2340 


GGT CGAAAT A 


X A I A(j 1 \J AL. U 






TdATGTACCT 


AGGTGGCTAA 


2400 


TGATATACGG 


CATATGAATA 


CAGTAATCAT 


CCAAGCACGT 


AAAAACTCGC 


TAGACGTTTA 


2460 


TGCCTGCTAG 


CCTGCTGGGT 


GTGTAGACTG 


GAGTACTGGA 


CAAACATCGC 


AATACAGAGG 


2520 


TACAGTATTT 


GTCTAGACAA 


TGATATACAT 


AGATAAAAAC 


CACTGTTGTA 


ACTTGTAAGC 


2580 


CACTAGCTCA 


CGTTCTCCAT 


GAGCTCTTCT 


CTCTGCTGTT 


TCTTCCTCTG 


CTAACTGCGT 


2640 



PCT/US98/11921 



-85- 



WO 98/56921 



PCTAJS98/11921 



TATGATATGA CGTCGTATAA ATAATCTCAC AATACTTCCT 
TTATGTTTAT TTAACAGTAG CAACCAACGC CGCTCGATGT 
TCACTATGTG GTGTGCAGAA GAACAAATGT AAGCAGCTCC 
TCAGTGTGGA AGCTTTCCAA CCAACGCCTC CTTCGAGGAA 
TGTAGGCCAT GCAAGCACAA GCACCTAACG CGAATCATCA 
GTTGGTACAT CACACCCCGC GTTTGACCTG ATCGGAAGCA 
GACCGGCTAT AGGTTTCCTG CATTGGACAG CAGAAGCCAG 
CTCCTGCCGT TTGATGAATC ATCCGGTCTT TCGTATTGAT 
ATAGCAAATT TTAAGATGTG AAACCACGAG ACGAGCGATA 
CATATGAAGC TTGTGCGAAA AAAAGGCGTG CCGCTGTAGC 
TCCCCAAAGA CAGGGATACG AATCCATGCT CGACAGAACC 
GACACTTAAG- TATAACAAAA GTAGTTGGAT TATTTCAGAA 
GGCCTTTTTG TACTTTGGTT ACTTGAGTTC AGACAGTGTA 
CGTAAGGTTT AAATATGGTT CGACAAATAT ATCAGTATAT 
GGCCTAGCAC AAACTTGATA CAGCTAGGAT AAAGTTAGAA 
AGCGACACCT GTCCTGTTAT GGTAGTTTAA GTCCATTCCT 
TATGATGCTG TTACATAATG CGATTGTTCA CAATAAAATT 
TTAGGCAGTT TTGTTCAACA GGCAAGTTGC ATAATGCATG 
TCATCAATTA ATCATAGGTT CGTCATTTTA GTTTCACTCC 
GAAGAAAAAT GTAGCAGTGC TTGCTGTTTA ATAAGTGGCA 
ACGCTTGTCT AGGACCAAAA TTTTAATCTG TCACTTTGAG 
CGCTACAAAA GAACGTAGGA GCTGAATTGT AACTTGATGG 
CAGTTCTAGC TAGCTACCTT ATTCTATACG CATCACCCTA 
ATCTGACCCC ACCGTCCCCT GCTCCAAACC AACTCTCCTT 
ACTTCCTGCA GCTATATATA CCACCATATG CCCATCTTAT 
AAGAAACAAT CAACCAGCAA CACTCTTCTC TTATAACATA 
ATG GCA ACT TCC ATG GGT TGT CTC GTC TTG CTC 
Met Ala Thr Ser Met Gly Cys Leu Val Leu Leu 

1 5 10 

CTC CTT CCC AGT GCC GTC CTT GGC CAC CCA TGG 
Leu Leu Pro Ser Ala Val Leu Gly His Pro Trp 

20 25 
CAG TTC TAT GAC CAT TCG TGC CCC AAG GCG AAG 
Gin Phe Tyr Asp His Ser Cys Pro Lys Ala Lys 

35 40 
ATT GTG GCA CAG GCT GTG GCC AAG GAG ACC AGG 
He Val Ala Gin Ala Val Ala Lys Glu Thr Arg 
50 55 



TATTTTCAGC ATGGCCTCTT 
TTCCTTCAAG AAACGGCCAC 
TACAGGTACC AGTAGTCATG 
CCTGGTCGTG CTGACATGAA 
CGACGCGCCG TGTACTGGGC 
TGCGTGTGTG TTGGCTGCAG 
TCATGTTAGG CACTCACGCG 
CACTAGTTCA CTACGCTGAT 
AATCTTAGAC GTTACCTATC 
ATCATTCGTA TACACTTTTG 
CTCCCTTCCC TGCAGATAAC 
GCAAAATCTC ACTTTTCGCT 
TGCTATATTG TCATGTGCTG 
CACTACTTTG TTATGGGTGG 
CGATGACTGA TCTACTGTAA 
GGACGACTCC AGATCCAGGA 
GCATGATGTT CTTCTACTCT 
TGCATATATG AGCAGCATAA 
TTCACATTAT TCCAGCCCTT 
GAGCTGTTTT CACTCCACCT 
CTAAAACTGA AGCACCAAAC 
GATTACTATA GCAGTTGCTA 
ACAACCCGGC TGACTGCTGC 
TCCTTGCATG CACTACACCC 
GAAACCATCC ACAAGAGGAG 
GTACAGCGAA GGTAACTCAC 
TGC CTT GTT TCT TCT 
Cys Leu Val Ser Ser 
15 

GGT GGC TTG TTC CCA 
Gly Gly Leu Phe Pro 
3 0 

GAG ATT GTG CAG TCC 
Glu He Val Gin Ser 
45 

ATG GCG GCA TCT TTA 
Met Ala Ala Ser Leu 
60 



2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4248 



4296 



4344 



4392 
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GTC AGA CTG CAT TTC CAT GAC TGC TTT GTC AAG GTTCAATTCT GCTTCCTCTG 4445 
Val Arg- Leu His Phe His Asp Cys Phe Val Lys 
65 70 75 



TTATGTTCTT 


TATATTACAT 


GCTCTGACAA 


AGCTATAAAG 


CTTGATACTG 


CAGTATAATA 


4505 


TAACAAGTTA 


GCTACACAAG 


TTTTGTACTT 


CAAGTCTTTT 


AACTATATGT 


TGGTGCAATA 


4565 


AGATTATGAG 


TAATCCATAT 


GAAGGTGTTG 


CAAGAGAACA 


TGAAAGGCAA 


AGATAAACGG 


4625 


ATGAACCCAT 


TACTAGCTTT 


GGCTGTATCA 


GACCAATAAC 


TTGAAATGCA 


CTTGTGCTAG 


4685 


CATGCCTAAG 


TATTAGAAAA 


GGTAGCATGG 


GAGAATCTAT 


ATTATTTTGG 


CTAACTTCTT 


4745 


TAGTTACTAT 


TGATTGATGA 


GAAAGCCTAC 


CATTGCCCAT 


GCCAGCCCTA 


ATGTCCCGGT 


4805 


GACATGATTG 


AGCCAGTACT 


ATGATTAATT 


TACTCTATTG 


TTCTCCTTTT 


TTGAGTGCTG 


4865 


TATAAGATGT 


CCTTTTTTTG 


AGCCACTCGA 


GAAGATGTTT 


ACTTAACTCT 


AGTGCGCAAT 


4925 


GATTGGAGCT 


CTCAGTGCAA 


CGCATGTGCT 


CTGTAATCTA 


CTGTCACCAC 


TACTCTGTAG 


4985 


TGTGTGCTTA 


AACTCTAAAC 


TATTCCACGT 


GGCTAGTAAT 


TACCAATCAT 


TTACAACACT 


5045 


GTTACATGTG 


TAG GGC TGC GAT GCT TCG GTG CTG TTG GAC AAC 
Gly Cys Asp Ala Ser Val Leu Leu Asp Asn 


AGC AGC 
Ser Ser 


5094 



80 85 

AGC ATA 
Ser He 

GGG TTT 
Gly Phe 
105 
CCA GGC 
Pro Gly 
120 

TCC ACC 
Ser Thr 



GTT AGT GAG AAA GGG TCC AAC CCG AAC AGG AAC TCC CTC AGG 5142 
Val Ser Glu Lys Gly Ser Asn Pro Asn Arg Asn Ser Leu Arg 

90 95 100 

GAG GTG ATC GAC CAG ATT AAG GCT GCT CTT GAG GCT GCC TGC 5190 
Glu Val He Asp Gin He Lys Ala Ala Leu Glu Ala Ala Cys 

110 115 
ACA GTC TCC TGT GCC GAC ATT GTT GCC CTT GCG GCT CGT GAT 523 8 

Thr Val Ser Cys Ala Asp He Val Ala Leu Ala Ala Arg Asp 
125 130 135 

GCC CTG GTATGTTCCA CTATCGACAA TCCTTTCCAA CCTCAAGGAA 52 90 

Ala Leu 



CAGACATGAT ATTTGTGTGT GTGTGTGTGT GTATATATAT ATATAGTGAT AGCTTTGGCA 5350 
AACTTAGATA TTTTCTGAGC TCTAAACCGT AG GTT GGT GGA CCA TAC TGG GAC 5403 

Val Gly Gly Pro Tyr Trp Asp 
140 145 
GTG CCA CTT GGC CGG AGA GAC TCG CTC GGT GCA AGC ATC CAG GGC TCC 5451 
Val Pro Leu Gly Arg Arg Asp Ser Leu Gly Ala Ser He Gin Gly Ser 

150 155 160 

AAC AAT GAC ATC CCA GCC CCC AAC AAC ACA CTC CCC ACT ATC ATC ACC 5499 
Asn Asn Asp He Pro Ala Pro Asn Asn Thr Leu Pro Thr He He Thr 
165 170 175 
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AAG TTC AAG CGC GAG GGC CTC AAT GTT GTT GAT GTT GTC GCC CTC-TCA 
Lys Phe Lys Arg Gin Gly Leu Asn Val Val Asp Val Val Ala Leu Ser 

1 190 
180 185 

GGTGATTTTT CTTGTATTTA TTAGTAACAT CTGTCCTTCG TTATTCACCA ACTTAGCGCA 
CACTCATATT ACGCATGGAT ACAATATCAT GTGTGAATAC A GGT GGT CAC ACC 

Gly Gly His Thr 

195 

ATT GGT ATG TCT CGG TGC ACT AGT TTC CGG CAG AGG CTA TAG AAC GAG 
He Gly Met Ser Arg Cys Thr Ser Phe Arg Gin Arg Leu Tyr Asn Gin 

200 205 210 

ACA GGC AAT GGC ATG GCT GAC AGC ACA CTG GAT GTA TCC TAG GCC GCA , 
Thr Gly Asn Gly Met Ala Asp Ser Thr Leu Asp Val Ser Tyr Ala Ala 
215 

AAG CTG AGG CAG GGA TGC CCC CGC TCT GGT GGT GAC AAC AAC CTC TTC 
Lys Leu Arg Gin Gly Cys Pro Arg Ser Gly Gly Asp Asn Asn Leu Phe 
235 

CCC TTG GAC TTC ATC ACC CCT GCC AAG TTT GAC AAT TTT TAC TAC AAG 
Pro Leu Asp Phe He Thr Pro Ala Lys Phe Asp Asn Phe Tyr Tyr Lys 

9 cc 260 
250 2bb 

AAC CTC CTG GCC GGC AAG GGC CTT CTA AGC TCT GAT GAG ATT CTG TTA 
Asn Leu Leu Ala Gly Lys Gly Leu Leu Ser Ser Asp Glu lie Leu Leu 

265 270 275 

ACC AAG AGC GCT GAG ACA GCG GCC CTC GTG AAG GCA TAT GCT GCT GAT 
Thr Lys Ser Ala Glu Thr Ala Ala Leu Val Lys Ala Tyr Ala Ala Asp 

285 290 
GTC IIt CTC TTC TTC CAG CAC TTT GCA CAG TCT ATG GTG AAT ATG GGA 
Val Asn Leu Phe Phe Gin His Phe Ala Gin Ser Met Val Asn Met Gly 
295 300 305 

AAC ATC TCG CCA CTG ACA GGG TCA CAA GGT GAG ATC AGG AAG AAC TGC 
Asn He Ser Pro Leu Thr Gly Ser Gin Gly Glu He Arg Lys Asn Cys 

315 320 325 

AGG AGG CTC AAC AAT GAC CAC TGA GGGCACTGAA GTCGCTTGAT GTGCTGAATT 
Arg Arg Leu Asn Asn Asp His * 

GTTCGTGATG TTGGTGGCGT ATTTTGTTTA AATAAGTAAG CATGGCTGTG ATTTTATCAT 
ATGATCGATC TTTGGGGTTT TATTTAACAC ATTGTAAAAT GTGTATCTAT TAATAACTCA 
ATGTATAAGA TGTGTTCATT CTTCGGTTGC CATAGATCTG CTTATTTGAC CTGTGATGTT 
TTGACTCCAA AAACCAAAAT CACAACTCAA TAAACTCATG GAATATGTCC ACCTGTTTCT 



5547 



5607 
5660 



5708 



5756 



5804 



5852 



5900 



5948 



5996 



6044 



6098 



6158 
6218 
6278 
6338 
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TGAAGAGTTC ATCTACCATT CCAGTTGGCA TTTATCAGTG TTGCAGCGGC GCTGTGCTTT 63 98 
GTAACATAAC AATTGTTCAC GGCATATATC CAAATCTAGA GGCCTACCAA AATGAGATAA 6458 
CAAGCCAACT AATCTGCTGG GAAATAGGTA ACAAGTCTCT AACAAGATCC GTTGACCTGC 6518 
AGGTCGACCT CGAGGGGGGG CCCGGTACCC AA 6550 
(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 334 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 
Met Ala Thr Ser Met Gly Cys Leu Val Leu Leu Cys Leu Val Ser Ser 

15 10 15 

Leu Leu Pro Ser Ala Val Leu Gly His Pro Trp Gly Gly Leu Phe Pro 

20 25 30 

Gin Phe Tyr Asp His Ser Cys Pro Lys Ala Lys Glu lie Val Gin Ser 

35 40 45 

He Val Ala Gin Ala Val Ala Lys Glu Thr Arg Met Ala Ala Ser Leu 

50 55 60 

Val Arg Leu His Phe His Asp Cys Phe Val Lys Gly Cys Asp Ala Ser 
65 70 75 80 

Val Leu Leu Asp Asn Ser Ser Ser He Val Ser Glu Lys Gly Ser Asn 

85 90 95 

Pro Asn Arg Asn Ser Leu Arg Gly Phe Glu Val He Asp Gin He Lys 

100 105 HO 

Ala Ala Leu Glu Ala Ala Cys Pro Gly Thr Val Ser Cys Ala Asp He 

115 120 125 

Val Ala Leu Ala Ala Arg Asp Ser Thr Ala Leu Val Gly Gly Pro Tyr 

130 135 140 

Trp Asp Val Pro Leu Gly Arg Arg Asp Ser Leu Gly Ala Ser He Gin 
145 150 155 160 

Gly Ser Asn Asn Asp He Pro Ala Pro Asn Asn Thr Leu Pro Thr He 

165 170 175 

He Thr Lys Phe Lys Arg Gin Gly Leu Asn Val Val Asp Val Val Ala 

180 185 190 

Leu Ser Gly Gly His Thr He Gly Met Ser Arg Cys Thr Ser Phe Arg 

195 200 205 

Gin Arg Leu Tyr Asn Gin Thr Gly Asn Gly Met Ala Asp Ser Thr Leu 
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210 215 220 

Asp Val- Ser Tyr Ala Ala Lys Leu Arg Gin Gly Cys Pro Arg Ser Gly 
225 230 235 240 

Gly Asp Asn Asn Leu Phe Pro Leu Asp Phe lie Thr Pro Ala Lys Phe 

245 250 255 

Asp Asn Phe Tyr Tyr Lys Asn Leu Leu Ala Gly Lys Gly Leu Leu Ser 

260 265 270 

Ser Asp Glu lie Leu Leu Thr Lys Ser Ala Glu Thr Ala Ala Leu Val 

275 280 285 

Lys Ala Tyr Ala Ala Asp Val Asn Leu Phe Phe Gin His Phe Ala Gin 

290 295 300 

Ser Met Val Asn Met Gly Asn lie Ser Pro Leu Thr Gly Ser Gin Gly 



305 



310 315 320 



Glu lie Arg Lys Asn Cys Arg Arg Leu Asn Asn Asp His * 

325 330 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

26 

TTYCAYGAYT GYTTYGTYAA YGGBTG 
(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (synthetic) 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

24 

SGTRTGSGCS CCGSWSAGVG CSAC 
(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1354 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 



ATCAACCAGC 


AACACTCTTC 


TCTTATAACA 


TAGTACAGCG 


AAGGTAACTC 


ACATGGCAAC 


60 


TTCCATGGGT 


TGTCTCGTCT 


TGCTCTGCCT 


TGTTTCTTCT 


CTCCTTCCCA 


GTGCCGTCCT 


120 


TGGCCACCCA 


TGGGGTGGCT 


TGTTCCCACA 


GTTCTATGAC 


CATTCGTGCC 


CCAAGGCGAA 


180 


GGAGATTGTG 


CAGTCCATTG 


TGGCACAGGC 


TGTGGCCAAG 


GAGACCAGGA 


TGGCGGCATC 


240 


TTTAGTCAGA 


CTGCATTTCC 


ATGACTGCTT 


TGTCAAGGGC 


TGCGATGCTT 


CGGTGCTGTT 


300 


GGACAACAGC 


AGCAGCATAG 


TTAGTGAGAA 


AGGGTCCAAC 


CCGAACAGGA 


ACTCCCTCAG 


360 


GGGGTTTGAG 


GTGATCGACC 


AGATTAAGGC 


TGCTCTTGAG 


GCTGCCTGCC 


CAGGCACAGT 


420 


CTCCTGTGCC 


GACATTGTTG 


CCCTTGCGGC 


TCGTGATTCC 


ACCGCCCTGG 


TTGGTGGACC 


480 


ATACTGGGAC 


GTGCCACTTG 


GCCGGAGAGA 


CTCGCTCGGT 


GCAAGCATCC 


AGGGCTCCAA 


540 


CAATGACATC 


CCAGCCCCCA 


ACAACACACT 


CCCCACTATC 


ATCACCAAGT 


TCAAGCGCCA 


600 


GGGCCTCAAT 


GTTGTTGATG 


TTGTCGCCCT 


CTCAGGTGGT 


CACACCATTG 


GTATGTCTCG 


660 


GTGCACTAGT 


TTCCGGCAGA 


GGCTATACAA 


CCAGACAGGC 


AATGGCATGG 


CTGACAGCAC 


720 


ACTGGATGTA 


TCCTACGCCG 


CAAAGCTGAG 


GCAGGGATGC 


CCCCGCTCTG 


GTGGTGACAA 


780 


CAACCTCTTC 


CCCTTGGACT 


TCATCACCCC 


TGCCAAGTTT 


GACAATTTTT 


ACTACAAGAA 


840 


CCTCCTGGCC 


GGCAAGGGCC 


TTCTAAGCTC 


TGATGAGATT 


CTGTTAACCA 


AGAGCGCTGA 


900 


b AL AO C UCj L L 


LT. CCjI CjAACjCj 


LATA 1 CjCTGC 


TGATGTCAAT 


CTCTTCTTCC 


AGCACTTTGC 


960 


ACAGTCTATG 


GTGAATATGG 


GAAACATCTC 


GCCACTGACA 


GGGTCACAAG GTGAGATCAG 


1020 


GAAGAACTGC 


AGGAGGCTCA 


ACAATGACCA 


CTGAGGGCAC 


TGAAGTCGCT 


TGATGTGCTG 


1080 


AATTGTTCGT 


GATGTTGGTG 


GCGTATTTTG 


TTTAAATAAG 


TAAGCATGGC 


TGTGATTTTA 


1140 


TCATATGATC 


GATCTTTGGG 


GTTTTATTTA 


ACACATTGTA 


AAATGTGTAT 


CTATTAATAA 


1200 


CTCAATGTAT 


AAGATGTGTT 


CATTCTTCGG 


TTGCCATAGA 


TCTGCTTATT 


TGACCTGTGA 


1260 


TGTTTTGACT 


CCAAAAACCA 


AAATCACAAC 


TCAATAAACT 


CATGGAATAT 


GTCCACCTGT 


1320 


TTCTTGAAAA 


AAAAAAAAAA 


AAAAAAAAAA 


AAAA 






1354 



(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GTCATAGAAC TGTGGG 16 
(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 16 base pairs 
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(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
ATAACATAGT ACAGCG 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10160 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 
(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
GGGCCCGCTA GCGGTACCCC CGGGGTCGAC CATGGTCCGT CCTGTAGAAA CCCCAACCCG 
TGAAATCAAA AAACTCGACG GCCTGTGGGC ATTCAGTCTG GATCGCGAAA ACTGTGGAAT 
TGATCAGCGT TGGTGGGAAA GCGCGTTACA AGAAAGCCGG GCAATTGCTG TGCCAGGCAG 
TTTTAACGAT CAGTTCGCCG ATGCAGATAT TCGTAATTAT GCGGGCAACG TCTGGTATCA 
GCGCGAAGTC TTTATACCGA AAGGTTGGGC AGGCCAGCGT ATCGTGCTGC GTTTCGATGC 
GGTCACTCAT TACGGCAAAG TGTGGGTCAA TAATCAGGAA GTGATGGAGC ATCAGGGCGG 
CTATACGCCA TTTGAAGCCG ATGTCACGCC GTATGTTATT GCCGGGAAAA GTGTACGTAT 
CACCGTTTGT GTGAACAACG AACTGAACTG GCAGACTATC CCGCCGGGAA TGGTGATTAC 
CGACGAAAAC GGCAAGAAAA AGCAGTCTTA CTTCCATGAT TTCTTTAACT ATGCCGGAAT 
CCATCGCAGC GTAATGCTCT ACACCACGCC GAACACCTGG GTGGACGATA TCACCGTGGT 
GACGCATGTC GCGCAAGACT GTAACCACGC GTCTGTTGAC TGGCAGGTGG TGGCCAATGG 
TGATGTCAGC GTTGAACTGC GTGATGCGGA TCAACAGGTG GTTGCAACTG GACAAGGCAC 
TAGCGGGACT TTGCAAGTGG TGAATCCGCA CCTCTGGCAA CCGGGTGAAG GTTATCTCTA 
TGAACTGTGC GTCACAGCCA AAAGCCAGAC AGAGTGTGAT ATCTACCCGC TTCGCGTCGG 
CATCCGGTCA GTGGCAGTGA AGGGCGAACA GTTCCTGATT AACCACAAAC CGTTCTACTT 
TACTGGCTTT GGTCGTCATG AAGATGCGGA CTTACGTGGC AAAGGATTCG ATAACGTGCT 
GATGGTGCAC GACCACGCAT TAATGGACTG GATTGGGGCC AACTCCTACC GTACCTCGCA 
TTACCCTTAC GCTGAAGAGA TGCTCGACTG GGCAGATGAA CATGGCATCG TGGTGATTGA 
TGAAACTGCT GCTGTCGGCT TTAACCTCTC TTTAGGCATT GGTTTCGAAG CGGGCAACAA 
GCCGAAAGAA CTGTACAGCG AAGAGGCAGT CAACGGGGAA ACTCAGCAAG CGCACTTACA 
GGCGATTAAA GAGCTGATAG CGCGTGACAA AAACCACCCA AGCGTGGTGA TGTGGAGTAT 
TGCCAACGAA CCGGATACCC GTCCGCAAGT GCACGGGAAT ATTTCGCCAC TGGCGGAAGC 
AACGCGTAAA CTCGACCCGA CGCGTCCGAT CACCTGCGTC AATGTAATGT TCTGCGACGC 
TCACACCGAT ACCATCAGCG ATCTCTTTGA TGTGCTGTGC CTGAACCGTT ATTACGGATG 
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60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
7 8.0 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
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GTATGTCCAA 


AGCGGCGATT 


TGGAAACGGC 


AGAGAAGGTA 


CTGGAAAAAG 


AACTTCTGGC 


1500 


CTGGCAGGAG 


AAACTGCATC 


AGCCGATTAT 


CATCACCGAA 


TACGGCGTGG 


ATACGTTAGC 


1560 


CGGGCTGCAC 


TCAATGTACA 


CCGACATGTG 


GAGTGAAGAG 


TATCAGTGTG 


CATGGCTGGA 


1620 


TATGTATCAC 


CGCGTCTTTG 


ATCGCGTCAG 


CGCCGTCGTC 


GGTGAACAGG 


TATGGAATTT 


1680 


CGCCGATTTT 


GCGACCTCGC 


AAGGCATATT 


GCGCGTTGGC 


GGTAACAAGA 


AAGGGATCTT 


1740 


CACTCGCGAC 


CGCAAACCGA 


AGTCGGCGGC 


TTTTCTGCTG 


CAAAAACGCT 


GGACTGGCAT 


1800 


GAACTTCGGT 


GAAAAACCGC 


AGCAGGGAGG 


CAAACAATGA 


ATCAACAACT 


CTCCTGGCGC 


1860 


ACCATCGTCG 


GCTACAGCCT 


CGGTGGGGAA 


TTGGAGCTCG 


AATTTCCCCG 


ATCGTTCAAA 


1920 


CATTTGGCAA 


TAAAGTTTCT 


TAAGATTGAA 


TCCTGTTGCC 


GGTCTTGCGA 


TGATTATCAT 


1980 


ATAATTTCTG 


TTGAATTACG 


TTAAGCATGT 


AATAATTAAC 


ATGTAATGCA 


TGACGTTATT 


2040 


TATGAGATGG 


GTTTTTATGA 


TTAGAGTCCC 


GCAATTATAC 


ATTTAATACG 


CGATAGAAAA 


2100 


CAAAATATAG 


CGCGCAAACT 


AGGATAAATT 


ATCGCGCGCG 


GTGTCATCTA 


TGTTACTAGA 


2160 


TCGATCGGGA 


ATTAAGCTTA 


GATCTGCATG 


GGTGGAGACT 


TTTCAACAAA 


GGGTAATATC 


2220 


CGGAAACCTC 


CTCGGATTCC 


ATTGCCCAGC 


TATCTGTCAC 


TTTATTGTGA 


AGATAGTGGA 


2280 


AAAGGAAGGT 


GGCTCCTACA 


AATGCCATCA 


TTGCGATAAA 


GGAAAGGCCA 


TCGTTGAAGA 


2340 


TGCCTCTGCC 


GACAGTGGTC 


CCAAAGATGG 


ACCCCCACCC 


ACGAGGAGCA 


TCGTGGAAAA 


2400 


AGAAGACGTT 


CCAACCACGT 


CTTCAAAGCA 


AGTGGATTGA 


TGTGATCATC 


GATGGAGACT 


2460 


TTTCAACAAA 


GGGTAATATC 


CGGAAACCTC 


CTCGGATTCC 


ATTGCCCAGC 


TATCTGTCAC 


2520 


TTTATTGTGA 


AGATAGTGGA 


AAAGGAAGGT 


GGCTCCTACA 


AATGCCATCA 


TTGCGATAAA 


.2580 


GGAAAGGCCA 


TCGTTGAAGA 


TGCCTCTGCC 


GACAGTGGTC 


CCAAAGATGG 


ACCCCCACCC 


2640 


ACGAGGAGCA 


TCGTGGAAAA 


AGAAGACGTT 


CCAACCACGT 


CTTCAAAGCA 


AGTGGATTGA 


2700 


TGTGATATCT 


CCACTGACGT 


AAGGGATGAC 


GCACAATCCC 


ACTATCCTTC 


GCAAGACCCT 


.2760 


TCCTCTATAT 


AAGGAAGTTC 


ATTTCATTTG 


GAGAGAACAC 


GGGGGACTCT 


AGAGGATCCA 


.2820 


GCTGAAGGCT 


CGACAAGGCA 


GTCCACGGAG 


GAGCTGATAT 


TTGGTGGACA 


AGCTGTGGAT 


2880 


AGGAGCAACC 


CTATCCCTAA 


TATACCAGCA 


CCACCAAGTC 


AGGGCAATCC 


CCAGATCAAG 


2940 


TGCAAAGGTC 


CGCCTTGTTT 


CTCCTCTGTC 


TCTTGATCTG 


ACTAATCTTG 


GTTTATGATT 


3000 


CGTTGAGTAA 


TTTTGGGGAA 


AGCTCCTTTG 


CTGCTCCACA 


CATGTCCATT 


CGAATTTTAC 


3060 


CGTGTTTAGC 


AAGGGCGAAA 


AGTTTGCATC 


TTGATGATTT 


AGCTTGACTA 


TGCGATTGCT 


3120 


TTCCTGGACC 


CGTGCAGCTG 


CGCTCGGATC 


TGGGGCCATT 


TGTTCCAGGC 


ACGGGATAAG 


3180 


CATTCAGCCA 


TGGCAGACGC 


CAAAAACATA 


AAGAAAGGCC 


CGGCGCCATT 


CTATCCTCTA 


3240 


GAGGATGGAA 


CCGCTGGAGA 


GCAACTGCAT 


AAGGCTATGA 


AGAGATACGC 


CCTGGTTCCT 


3300 


GGAACAATTG 


CTTTTACAGA 


TGCACATATC 


GAGGTGAACA 


TCACGTACGC 


GGAATACTTC 


3360 


GAAATGTCCG 


TTCGGTTGGC 


AGAAGCTATG 


AAACGATATG 


GGCTGAATAC 


AAATCACAGA 


3420 


ATCGTCGTAT 


GCAGTGAAAA 


CTCTCTTCAA 


TTCTTTATGC 


CGGTGTTGGG 


CGCGTTATTT 


3480 


ATCGGAGTTG 


CAGTTGCGCC 


CGCGAACGAC 


ATTTATAATG 


AACGTGAATT 


GCTCAACAGT 


3540 


ATGAACATTT 


CGCAGCCTAC 


CGTAGTGTTT 


GTTTCCAAAA 


AGGGGTTGCA 


AAAAATTTTG 


3600 


AACGTGCAAA 


AAAAATTACC 


AATAATCCAG 


AAAATTATTA 


TCATGGATTC 


TAAAACGGAT 


3660 


TACCAGGGAT 


TTCAGTCGAT 


GTACACGTTC 


GTCACATCTC 


ATCTACCTCC 


CGGTTTTAAT 


3720 
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GAATACGATT 

TCCTCTGGAT 

AGATTCTCGC 

TTAAGTGTTG 

TGTGGATTTC 

CAGGATTACA 

AGCACTCTGA 
CCTCTTTCGA 

CAAGGATATG 
AAACCGGGCG 
ACCGGGAAAA 
ATGTCCGGTT 
CTACATTCTG 
TTGAAGTCTT 
TTGTTACAAC 
GGTGAACTTC 
ATCGTGGATT 
TTTGTGGACG 
ATCCTCATAA 
GATGACGAAA 
TTGGCAATAA 
ATTTCTGTTG 
GAGATGGGTT 
AATATAGCGC 
ATCGGGAATT 
TGGACGACTT 
CCATGTTCAC 
CACAGATGGT 
ATCTCCAGGA 
CTAACTGCAT 
TATGGACGAT 
AAAAGGTAGT 
CAGAACTCGC 
GGTAATATCC 
GATAGTGGAA 
CGTTGAAGAT 
CGTGGAAAAA 
CACTGACGTA 



TTGTACCAGA 
CTACTGGGTT 
ATGCCAGAGA 
TTCCATTCCA 
GAGTCGTCTT 
AAATTCAAAG 
TTGACAAATA 
AAGAAGTCGG 
GGCTCACTGA 
CGGTCGGTAA 
CGCTGGGCGT 
ATGTAAACAA 
GAGACATAGC 
TAATTAAATA 
ACCGCAACAT 
CCGCCGCCGT 
ACGTCGCCAG 
AAGTACCGAA 
AGGCCAAGAA 
TTCTTAGCTA 
AGTTTCTTAA 
AATTACGTTA 
TTTATGATTA 
GCAAACTAGG 
GAGATCTCAT 
CCTCTATCTC 
CACTGATAAT 
TAGAGAGGCT 
GATCAAATAC 
CAAGAACACA 
TCAAGGCTTG 
TCCCACTGAA 
CGTAAAGACT 
GGAAACCTCC 
AAGGAAGGTG 
GCCTCTGCCG 
GAAGACGTTC 
AGGGATGACG 



GTCCTTTGAT 
ACCTAAGGGT 
TCCTATTTTT 
TCACGGTTTT 
AATGTATAGA 
TGCGTTGCTA 
CGATTTATCT 
GGAAGCGGTT 
GACTACATCA 
AGTTGTTCCA 
TAATCAGAGA 
TCCGGAAGCG 
TTACTGGGAC 
CAAAGGATAT 
CTTCGACGCG 
TGTTGTTTTG 
TCAAGTAACA 
AGGTCTTACC 
GGGCGGAAAG 
TTGTAATCAG 
GATTGAATCC 
AGCATGTAAT 
GAGTCCCGCA 
ATAAATTATC 
ATGTCGAGCT 
TACGATCTAG 
GAGAAGATTA 
TACGCAGCAG 
CTTCCCAAGA 
GAGAAAGATA 
CTTCACAAAC 
TCAAAGGCCA 
GGCGAACAGT 
TCGGATTCCA 
GCTCCTACAA 
ACAGTGGTCC 
CAACCACGTC 
CACAATCCCA 



CGTGACAAAA 
GTGGCCCTTC 
GGCAATCAAA 
GGAATGTTTA 
TTTGAAGAAG 
GTACCAACCC 
AATTTACACG 
GCAAAACGCT 
GCTATTCTGA 
TTTTTTGAAG 
GGCGAATTAT 
ACCAACGCCT 
GAAGACGAAC 
CAGGTGGCCC 
GGCGTGGCAG 
GAGCACGGAA 
ACCGCGAAAA 
GGAAAACTCG 
TCCAAATTGT 
ATCCGCGAAT 
TGTTGCCGGT 
AATTAACATG 
ATTATACATT 
GCGCGCGGTG 
CGGGGATCTC 
TCAGGAAGTT 
GCCTTTTCAA 
GTCTCATCAA 
AGGTTAAAGA 
TATTTCTCAA 
CAAGGCAAGT 
TGGAGTCAAA 
TCCATCGATG 
TTGCCCAGCT 
ATGCCATCAT 
CAAAGATGGA 
TTCAAAGCAA 
CTATCCTTCG 



CAATTGCACT 

CGCATAGAAC 

TCATTCCGGA 

CTACACTCGG 

AGCTGTTTTT 
TATTTTCATT 

AAATTGCTTC 
TCCATCTTCC 
TTACACCCGA 
CGAAGGTTGT 
GTGTCAGAGG 
TGATTGACAA 
ACTTCTTCAT 
CCGCTGAATT 
GTCTTCCCGA 
AGACGATGAC 
AGTTGCGCGG 
ACGCAAGAAA 
AAAATGTAAC 
TTCCCCGATC 
CTTGCGATGA 
TAATGCATGA 
TAATACGCGA 
TCATCTATGT 
CTTTGCCCCA 
CGACGGAGAA 
TTTCAGAAAG 
GACGATCTAC 
TGCAGTCAAA 
GATCAGAAGT 
AATAGAGATT 
GATTCAAATA 
ATTGAGACTT 
ATCTGTCACT 
TGCGATAAAG 
CCCCCACCCA 
GTGGATTGAT 
CAAGACCCTT 



GATAATGAAT 
TGCCTGCGTC 
TACTGCGATT 
ATATTTGATA 
ACGATCCCTT 
CTTCGCCAAA 
TGGGGGCGCA 
AGGGATACGA 
GGGGGATGAT 
GGATCTGGAT 
ACCTATGATT 
GGATGGATGG 
AGTTGACCGC 
GGAATCGATA 
CGATGACGCC 
GGAAAAAGAG 
AGGAGTTGTG 
AATCAGAGAG 
TGTATTCAGC 
GTTCAAACAT 
TTATCATATA 
CGTTATTTAT 
TAGAAAACAA 
TACTAGATCG 
GAGATCACAA 
GGTGACGATA 
AATGCTAACC 
CCGAGCAATA 
AGATTCAGGA 
ACTATTCCAG 
GGAGTCTCTA 
GAGGACCTAA 
TTCAACAAAG 
TTATTGTGAA 
GAAAGGCCAT 
CGAGGAGCAT 
GTGATATCTC 
CCTCTATATA 



3780, 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 
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a pi pi a anTTra 

AvurAAG 1 1 L. A 


1 I 1L.A1 1 IuVj 


apanriipapn 


PTPAPAZiPPT 


ppnaTPPTTT 

tuoHl L.G 111 


appaTPaTTP 
AGGAJ.GA1 1G 


c r\ c r\ 


ArtLHHb A 1 Vj o 


ATTPP APPPA 


vjVj 1 1L1 LLUU 


CCGCTTGGGT 


GGAGAGGCTA 


1 1 GGGL. 1 A1G 


£1 OA 

0 L2. 0 


/it- 1 GGGGAGA 


n pap ara atp 

AGAGAG AA x G 


OvjrV_ 1 <aG ILlu 


ATGCCGCCGT 


GTTCCGGCTG 


1 GAGCGCAGG 


0I0O 




rp/ ii i ii 1 1> i ill ii 1 ip* ppp 1 


a apapppapp 


TGTCCGGTGC 


CCTGAATGAA 


G 1 GGAGGAGG 




AGG G AGG G L. G 


GG 1A1GG IGG 


PTPPPPAPP A 
G x GGGGAGGA 


CGGGCGTTCC 


TTGCGCAGCT 


G iGG iGGAGG 


6300 


llul GAG 1 vjA 


APPPPP A APP 
AGV- Ljo AAo G 


<jAv- x oov- 1 vjG 


TATTGGGCGA AGTGCCGGGG 


pi\pp A rporp /-»/-i 

GAGGA1G1GC 


6360 


rp/-irpf-t TV rrirtrp/^ TV 

1 G 1 G A 1LI LA 


pprpmp P'TP'P'T 

GG I x GG 1 GG I 


PPPP AP A A AP* 

G L G G AG AAAG 


TATCCATCAT 


GGCTGATGCA 


ATGCGGCGGC 


6420 


TGGAlAGGL 1 


TGA1 GGGGG I 


AGG 1GGGLA1 


TCGACCACCA 


AGCGAAACAT 


^>i^pti rr»^^ ^^*» 

CGCATCGAGC 


6480 


GAG CAGGTAC 


x G GG A x G G AA 


GL GGG lLl TG 


TCGATCAGGA 


TGATCTGGAC 


GAAGAGCATC 


6540 


AGGGGG 1 GGG 


GGGAGGGGAA 


Llbli GGGGA 


GGCTCAAGGC 


GCGCATGCCC 


GACGGCGAGG 


6600 


ATG I GGTCGT 


/—i T\ /""i /-i /"i T\ rpp« P* P* 

GAG L G A TGG G 


GA1.GGL IGG I 


TGCCGAATAT 


CATGGTGGAA 


tv 7v m/~t /**</— io/^/'ii 1 1 

AATGGCCGCT 


6660 


1 J. 1L1 GGA 1 1 


p* a tph a p»tp»t 
GA1 GGAG 1 G 1 


GGGGGGG IGG 


GTGTGGCGGA 


CCGCTATCAG 


GACATAG CGT 


6720 


tp»p*p*ta r*r*r*c 
1GGG 1 AGGGG 


1GA1A1 1GG 1 


P* A A P" A r* PTTP 

G AAG AGG 1 i G 


GCGGCGAATG 


GGCTGACCGC 


TTCCTCGTGC 


6780 


1 1 1AGGG1A1 


GGCGGG 1GGG 


P A TTPP P A P P 
GA1 1 GGG AGG 


GCATCGCCTT 


CTATCGCCTT 


CTTGACGAGT 


6840 


mpmmprpp A P'P' 

1L1IL1 GAGG 


GGGAG 1 G 1 GG 


r'P"T»rpr»r' A A AT 
GGx xGGAAAl 


GACCGACCAA 


GCGACGCCCA 


ACCTGCCATC 


6900 


AL.GAGA1 1 1G 


paTTPPa ppfz 

GA1 1GGAL.GG 


PPPPPTTPTA 
GGGGGJL XGIA 


TGAAAGGTTG 


GGCTTCGGAA 


T P* P* T TTT P« P« P« 

TGGTTTTCCG 


6960 


GGAGGGGGGG 


I GGA x GAx L G 


1 GGAGGGGGG 


GGATCTCATG 


CTGGAGTTCT 


TCGCCCACCC 


7020 


p*aap«ap«ap , p't 
G AAG AGAGG I 


r*r* a tp , p» a p7\n 
GGA I GGAGAG 


AGGGG 1 1G 1 1 


ACACCGGACT 


GGGCGCGGGA 


TAGGATATTC 


7080 


AGATTGGGAT 


pppaTTPapp 
GGGA 1 1 GAGG 


I 1AAAGGGGG 


CGCTGAGACC 


ATGCTCAAGG 


TAP*P*^A TV Ti/^rp 

TAGG CAATGT 


7140 


GG 1 GAGGG I G 


p* a p*p*p'pv , *p , p*7 i . 
GAGGG GGGGA 


TPTATPTPPA 

1G1A1G1GGA 


GGGCATTGGT 


GGAGCGCGCT 


TCGGGGATAC 


7200 


pprnp P'TTP'T A 
GG1GG1 1G1A 


A P*TP A P A PPP 
AG 1 GAGAGGG 


P AT ATP A PPP 
VjA i A i GAGGG 


CCTCACTCCG 


CTTGATCTTG 


P»P^A A AP*ArpAT 

GCAAAGATAi 


7260 


ttp a or* p a tt 

I 1GAGGGA1 1 


T A HPT 1 A PIT* A r PP? 
1A1 IAGIAIvj 


TPTT A A TTTT* 
1 vj 1 X AAX 111 


CATTTGCAGT 


GCAGTATTTT 


GlAlTLGATG 


732 0 


rprrirri Ti rp^^ rp 74 TV rp 

1 1 1 A1G1AA1 


tp'P'tt nraflT 1 

1GG1 1ALAA1 


TAATAAATAT 
1 AA1AAA1A1 


TCAAATCAGA 


TTATTGACTG 


T CATTTGTAT 


h ^ o n 

7380 


p 1 a a a tp'p'tp't 
gaaa1gg igi 


I 1AA1GGA1A 


1111 1A1 1A1 


AATATTGATG 


ATATCTCAAT 


r* A A A A piptrpTV P* 

CAAAACGTAG 


744 0 


A 'PA ATA ATA A 

A1AA1AA1AA 


TATTTATTTA 


ATATTTTTGC 


GTCGCACAGT 


GAAAATCTAT 


A T>P< A A rflFfl TV /-l 

ATGAGATTAC 


7500 


aaa a t» a r*rr* a 
AAAAIAGGGA 


P* A A P* A TT A TT 

LAALAI 1AI 1 


TA A P* A TA P» A T 

1 AAG A 1 AGA 1 


AGACATTAAC 


CCTGAGACTG 


TTGGACATCA 


7560 


AGGGG 1 AGA 1 


TP*P*TTP'A rprT* 

1L.G 1 1GAJLGG 


A T A P* r* A PPTP 

A1AGLAGG 1G 


ATTCTTGGGG 


ACAAAAGCAC 


GGTTTGGCCG 


7620 


1 1GGA1 1GG1 


PPA PP A APP A 
GGAL.GAAGGA 


GG 1 1 IGG 1A1 


ATPPTPPiPPT 


TPPATPATPT 

X vjvn X 1L1 


P* A TP* A P»P»TP»P» 

CATCAGGTCG 


7 680 


A ATP A A aTTT 


PTPPA APA AP 


TP A.TPTT A PIT 
X v_.rl lull jn\j X 


CGCAACGAAA 


CCGGGGCATA 


TPPTPPaPTP 
1 GG 1 GGAG 1 G 


in a r\ 
/ /4 U 


TPAPTAPAAT 


PTPPTPTPAT 

v— X Vjv_ X l— 1 O-rt X 


PPPPPATAPT 


TAAGCCAGCC 


CCGACACCCG 


PPA APAPPPP 
v.. U AAU AG GGG 


/ 0 U U 




PTP A PPPPPT 

V- X OnLOUUv, X 


TGTPTPPTPP 


CGGCATCCGC 


TTACAGACAA 


PPTPTPAPPP 
uL lul O ACGo 


H fl £ Pi 


tptpppppap 


PTPPATPTPT 

LIU Ln X O X O X 


PAPZXPPTTTT 

LnUrtUU X X X X 


CACCGTCATC 


ACCGAAACGC 


PPP APA PPA A 
Vj L vj AG AG G AA 


/ i?^ U 


APPPPPTPPT 


PZiTAPPPPTA 


TTTTT ATA PP 
1111 IHlnuu 


TTAATGTCAT 


GATAATAATG 


PTT^PTT A P* A 

G 1 1 1 G 1 1 AGA 


/ y ou 


CGTCAGGTGG 


CACTTTTCGG 


GGAAATGTGC 


GCGGAACCCC 


TATTTGTTTA 


TTTTTCTAAA 


8040 


TACATTCAAA 


TATGTATCCG 


CTCATGAGAC 


AATAACCCTG 


ATAAATGCTT 


CAATAATATT 


8100 


GAAAAAGGAA 


GAGTATGAGT 


ATTCAACATT 


TCCGTGTCGC 


CCTTATTCCC 


TTTTTTGCGG 


8160 


CATTTTGCCT 


TCCTGTTTTT 


GCTCACCCAG 


AAACGCTGGT 


GAAAGTAAAA 


GATGCTGAAG 


8220 


ATCAGTTGGG 


TGCACGAGTG 


GGTTACATCG 


AACTGGATCT 


CAACAGCGGT 


AAGATCCTTG 


8280 
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AGAGTTTTCG CCCCGAAGAA CGTTTTCCAA TGATGAGCAC 
GCGCGGTATT ATC CCGTATT GACGCCGGGC AAGAGCAACT 
CTCAGAATGA CTTGGTTGAG TACTCACCAG TCACAGAAAA 
CAGTAAGAGA ATTATGCAGT GCTGCCATAA CCATGAGTGA 
TTCTGACAAC GATCGGAGGA CCGAAGGAGC TAACCGCTTT 
ATGTAACTCG CCTTGATCGT TGGGAACCGG AGCTGAATGA 
GTGACACCAC GATGCCTGTA GCAATGGCAA CAACGTTGCG 
TACTTACTCT AGCTTCCCGG CAACAATTAA TAGACTGGAT 
GACCACTTCT GCGCTCGGCC CTTCCGGCTG GCTGGTTTAT 
GTGAGCGTGG GTCTCGCGGT ATCATTGCAG CACTGGGGCC 
TCGTAGTTAT CTACACGACG GGGAGTCAGG CAACTATGGA 
CTGAGATAGG TGCCTCACTG ATTAAGCATT GGTAACTGTC 
TACTTTAGAT TGATTTAAAA CTTCATTTTT AATTTAAAAG 
TTGATAATCT CATGACCAAA ATCCCTTAAC GTGAGTTTTC 
CCGTAGAAAA GATCAAAGGA TCTTCTTGAG ATCCTTTTTT 
TGCAAACAAA AAAACCACCG CTACCAGCGG TGGTTTGTTT 
CTCTTTTTCC GAAGGTAACT GGCTTCAGCA GAGCGCAGAT 
TGTAGCCGTA GTTAGGCCAC CACTTCAAGA ACTCTGTAGC 
TGCTAATCCT GTTACCAGTG GCTGCTGCCA GTGGCGATAA 
ACTCAAGACG ATAGTTACCG GATAAGGCGC AGCGGTCGGG 
CACAGCCCAG CTTGGAGCGA ACGACCTACA CCGAACTGAG 
GAGAAAGCGC CACGCTTCCC GAAGGGAGAA AGGCGGACAG 
TCGGAACAGG AGAGCGCACG AGGGAGCTTC CAGGGGGAAA 
CTGTCGGGTT TCGCCACCTC TGACTTGAGC GTCGATTTTT 
GGAGCCTATG GAAAAACGCC AGCAACGCGG CCTTTTTACG 
CTTTTGCTCA CATGTTCTTT CCTGCGTTAT CCCCTGATTC 
CCTTTGAGTG AGCTGATACC GCTCGCCGCA GCCGAACGAC 
GCGAGGAAGC GGAAGAGCGC CCAATACGCA AACCGCCTCT 
ATTAATGCAG CTGGCACGAC AGGTTTCCCG ACTGGAAAGC 
TTAATGTGAG TTAGCTCACT CATTAGGCAC CCCAGGCTTT 
GTATGTTGTG TGGAATTGTG AGCGGATAAC AATTTCACAC 
ATTACGCCAA GCTTCCGCGG 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11784 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 



TTTTAAAGTT 
CGGTCGCCGC 
GCATCTTACG 
TAACACTGCG 
TTTGCACAAC 
AGCCATACCA 
CAAACTATTA 
GGAGGCGGAT 
TGCTGATAAA 
AGATGGTAAG 
TGAACGAAAT 
AGACCAAGTT 
GATCTAGGTG 
GTTCCACTGA 
TCTGCGCGTA 
GCCGGATCAA 
ACCAAATACT 
ACCGCCTACA 
GTCGTGTCTT 
CTGAACGGGG 
ATACCTACAG 
GTATCCGGTA 
CGCCTGGTAT 
GTGATGCTCG 
GTTCCTGGCC 
TGTGGATAAC 
CGAGCGCAGC 
CCCCGCGCGT 
GGGCAGTGAG 
ACACTTTATG 
AGGAAACAGC 



CTGCTATGTG 
ATACACTATT 
GATGGCATGA 
GCCAACTTAC 
ATGGGGGATC 
AACGACGAGC 
ACTGGCGAAC 
AAAGTTGCAG 
TCTGGAGCCG 
CCCTCCCGTA 
AGACAGATCG 
TACTCATATA 
AAGATCCTTT 
GCGTCAGACC 
ATCTGCTGCT 
GAGCTACCAA 
GTCCTTCTAG 
TACCTCGCTC 
ACCGGGTTGG 
GGTTCGTGCA 
CGTGAGCATT 
AGCGGCAGGG 
CTTTATAGTC 
TCAGGGGGGC 
TTTTGCTGGC 
CGTATTACCG 
GAGTCAGTGA 
TGGCCGATTC 
CGCAACGCAA 
CTTCCGGCTC 
TATGACCATG 



8340 _ 
8400 
8460 
8520 
8580 
8640 
8700 
8760 
8820 
8880 
8940 
9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10160 
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(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 



GGGCCCACCA 


CTGTTGTAAC 


TTGTAAGCCA 


CTAGCTCACG 


TTCTCCATGA 


GCTCTTCTCT 


60 


CTGCTGTTTC 


TTCCTCTGCT 


AACTGCGTTA 


TGATATGACG 


TCGTATAAAT 


AATCTCACAA 


120 


TACTTCCTTA 


TTTTCAGCAT 


GGCCTCTTTT 


ATGTTTATTT 


AACAGTAGCA 


ACCAACGCCG 


180 


CTCGATGTTT 


CCTTCAAGAA 


ACGGCCACTC 


ACTATGTGGT 


GTGCAGAAGA 


ACAAATGTAA 


240 


GCAGCTCCTA 


CAGGTACCAG 


TAGTCATGTC 


AGTGTGGAAG 


CTTTCCAACC 


AACGCCTCCT 


300 


TCGAGGAACC 


TGGTCGTGCT 


GACATGAATG 


TAGGCCATGC 


AAGCACAAGC 


ACCTAACGCG 


360 


AATCATCACG 


ACGCGCCGTG 


TACTGGGCGT 


TGGTACATCA 


CACCCCGCGT 


TTGACCTGAT 


420 


CGGAAGCATG 


CGTGTGTGTT 


GGCTGCAGGA 


CCGGCTATAG 


GTTTCCTGCA 


TTGGACAGCA 


480 


GAAGCCAGTC 


ATGTTAGGCA 


CTCACGCGCT 


CCTGCCGTTT 


GATGAATCAT 


CCGGTCTTTC 


540 


GTATTGATCA 


CTAGTTCACT 


ACGCTGATAT 


AGCAAATTTT 


AAGATGTGAA 


ACCACGAGAC 


600 


GAG CG ATAAA 


TCTTAGACGT 


TACCTATCCA 


TATGAAGCTT 


GTGCGAAAAA 


AAGGCGTGCC 


660 


GCTGTAGCAT 


CATTCGTATA 


CACTTTTGTC 


CCCAAAGACA 


GGGATACGAA 


TCCATGCTCG 


720 


ACAGAACCCT 


CCCTTCCCTG 


CAGATAACGA 


CACTTAAGTA 


TAACAAAAGT 


AGTTGGATTA 


780 


TTTCAGAAGC 


AAAATCTCAC 


TTTTCGCTGG 


CCTTTTTGTA 


CTTTGGTTAC 


TTGAGTTCAG 


840 


ACAGTGTATG 


CTATATTGTC 


ATGTGCTGCG 


TAAGGTTTAA 


ATATGGTTCG 


ACAAATATAT 


900 


CAGTATATCA 


CTACTTTGTT 


ATGGGTGGGG 


CCTAGCACAA 


ACTTGATACA 


GCTAGGATAA 


960 


AGTTAGAACG 


ATGACTGATC 


TACTGTAAAG 


CGACACCTGT 


CCTGTTATGG 


TAGTTTAAGT 


1020 


CCATTCCTGG 


ACGACTCCAG 


ATCCAGGATA 


TGATGCTGTT 


ACATAATGCG 


.ATTGTTCACA 


1080 


ATAAAATTGC. ATGATGTTCT 


TCTACTCTTT 


AGGCAGTTTT 


GTTCAACAGG 


CAAGTTGCAT 


1140 


AATGCATGTG 


CATATATGAG 


CAGCATAATC 


ATCAATTAAT 


CATAGGTTCG 


TCATTTTAGT 


1200 


TTCACTCCTT 


CACATTATTC 


CAGCCCTTGA 


AGAAAAATGT 


AGCAGTGCTT 


GCTGTTTAAT 


1260 


AAGTGGCAGA 


GCTGTTTTCA 


CTCCACCTAC 


GCTTGTCTAG 


GACCAAAATT 


TTAATCTGTC 


1320 


ACTTTGAGCT 


AAAACTGAAG 


CACCAAACCG 


CTACAAAAGA 


ACGTAGGAGC 


TGAATTGTAA 


1380 


CTTGATGGGA 


TTACTATAGC 


AGTTGCTACA 


GTTCTAGCTA 


GCTACCTTAT 


TCTATACGCA 


1440 


TCACCCTAAC 


AACCCGGCTG 


ACTGCTGCAT 


CTGACCCCAC 


CGTCCCCTGC 


TCCAAACCAA 


1500 


CTCTCCTTTC 


CTTGCATGCA 


CTACACCCAC 


TTCCTGCAGC 


TATATATACC 


ACCATATGCC 


1560 


CATCTTATGA 


AACCATCCAC 


AAGAGGAGAA GAAACAATCA ACCAGCAACA 


CTCTTCTCTT 


1620 


ATAACATAGT 


ACAGCGAAGG 


TAACTCACGT 


CGACCATGGT 


CCGTCCTGTA 


GAAACCCCAA 


1680 


CCCGTGAAAT 


CAAAAAACTC 


GACGGCCTGT 


GGGCATTCAG 


TCTGGATCGC 


GAAAACTGTG 


1740 


GAATTGATCA 


GCGTTGGTGG 


GAAAGCGCGT 


TACAAGAAAG 


CCGGGCAATT 


GCTGTGCCAG 


1800 


GCAGTTTTAA 


CGATCAGTTC 


GCCGATGCAG 


ATATTCGTAA 


TTATGCGGGC 


AACGTCTGGT 


1860 


ATCAGCGCGA 


AGTCTTTATA 


CCGAAAGGTT 


GGGCAGGCCA 


GCGTATCGTG 


CTGCGTTTCG 


1920 


ATGCGGTCAC 


TCATTACGGC 


AAAGTGTGGG 


TCAATAATCA 


GGAAGTGATG 


GAGCATCAGG 


1980 


GCGGCTATAC 


GCCATTTGAA 


GCCGATGTCA 


CGCCGTATGT 


TATTGCCGGG 


AAAAGTGTAC 


2040 


GTATCACCGT 


TTGTGTGAAC 


AACGAACTGA 


ACTGGCAGAC 


TATCCCGCCG 


GGAATGGTGA 


2100 


TTACCGACGA 


AAACGGCAAG 


AAAAAGCAGT 


CTTACTTCCA 


TGATTTCTTT 


AACTATGCCG 


2160 
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GAATCCATCG 
TGGTGACGCA 
ATGGTGATGT 
GCACTAGCGG 
TCTATGAACT 
TCGGCATCCG 
ACTTTACTGG 
TGCTGATGGT 
CGCATTACCC 
TTGATGAAAC 
ACAAGCCGAA 
TACAGGCGAT 
GTATTGCCAA 
AAGCAACGCG 
ACGCTCACAC 
GATGGTATGT 
TGGCCTGGCA 
TAGCCGGGCT 
TGGATATGTA 
ATTTCGCCGA 
TCTTCACTCG 
GCATGAACTT 
GCGCACCATC 
CAAACATTTG 
TCATATAATT 
TATTTATGAG 
AAAACAAAAT 
TAGATCGATC 
TATCCGGAAA 
TGGAAAAGGA 
AAGATGCCTC 
AAAAAGAAGA 
GACTTTTCAA 
TCACTTTATT 
TAAAGGAAAG 
ACCCACGAGG 
TTGATGTGAT 
CCCTTCCTCT 



CAGCGTAATG 
TGTCGCGCAA 
CAGCGTTGAA 
GACTTTGCAA 
GTGCGTCACA 
GTCAGTGGCA 
CTTTGGTCGT 
GCACGACCAC 
TTACGCTGAA 
TGCTGCTGTC 
AGAACTGTAC 
TAAAGAGCTG 
CGAACCGGAT 
TAAACTCGAC 
CGATACCATC 
CCAAAGCGGC 
GGAGAAACTG 
GCACTCAATG 
TCACCGCGTC 
TTTTGCGACC 
CGACCGCAAA 
CGGTGAAAAA 
GTCGGCTACA 
GCAATAAAGT 
TCTGTTGAAT 
ATGGGTTTTT 
ATAGCGCGCA 
GGGAATTAAG 
CCTCCTCGGA 
AGGTGGCTCC 
TGCCGACAGT 
CGTTCCAACC 
CAAAGGGTAA 
GTGAAGATAG 
GCCATCGTTG 
AGCATCGTGG 
ATCTCCACTG 
ATATAAGGAA 



CTCTACACCA 
GACTGTAACC 
CTGCGTGATG 
GTGGTGAATC 
GCCAAAAGCC 
GTGAAGGGCG 
CATGAAGATG 
GCATTAATGG 
GAGATGCTCG 
GGCTTTAACC 
AGCGAAGAGG 
ATAGCGCGTG 
ACCCGTCCGC 
CCGACGCGTC 
AGCGATCTCT 
GATTTGGAAA 
CATCAGCCGA 
TACACCGACA 
TTTGATCGCG 
TCGCAAGGCA 
CCGAAGTCGG 
CCGCAGCAGG 
GCCTCGGTGG 
TTCTTAAGAT 
TACGTTAAGC 
ATGATTAGAG 
AACTAGGATA 
CTTAGATCTG 
TTCCATTGCC 
TACAAATGCC 
GGTCCCAAAG 
ACGTCTTCAA 
TATCCGGAAA 
TGGAAAAGGA 
AAGATGCCTC 
AAAAAGAAGA 
ACGTAAGGGA 
GTTCATTTCA 



CGCCGAACAC 
ACGCGTCTGT 
CGGATCAACA 
CGCACCTCTG 
AGACAGAGTG 
AACAGTTCCT 
CGGACTTACG 
ACTGGATTGG 
ACTGGGCAGA 
TCTCTTTAGG 
CAGTCAACGG 
ACAAAAACCA 
AAGTGCACGG 
CGATCACCTG 
TTGATGTGCT 
CGGCAGAGAA 
TTATCATCAC 
TGTGGAGTGA 
TCAGCGCCGT 
TATTGCGCGT 
CGGCTTTTCT 
GAGGCAAACA 
GGAATTGGAG 
TGAATCCTGT 
ATGTAATAAT 
TCCCGCAATT 
AATTATCGCG 
CATGGGTGGA 
CAGCTATCTG 
ATCATTGCGA 
ATGGACCCCC 
AGCAAGTGGA 
CCTCCTCGGA 
AGGTGGCTCC 
TGCCGACAGT 
CGTTCCAACC 
TGACGCACAA 
TTTGGAGAGA 



CTGGGTGGAC 
TGACTGGCAG 
GGTGGTTGCA 
GCAACCGGGT 
TGATATCTAC 
GATTAACCAC 
TGGCAAAGGA 
GGCCAACTCC 
TGAACATGGC 
CATTGGTTTC 
GGAAACTCAG 
CCCAAGCGTG 
GAATATTTCG 
CGTCAATGTA 
GTGCCTGAAC 
GGTACTGGAA 
CGAATACGGC 
AGAGTATCAG 
CGTCGGTGAA 
TGGCGGTAAC 
GCTGCAAAAA 
ATGAATCAAC 
CTCGAATTTC 
TGCCGGTCTT 
TAACATGTAA 
ATACATTTAA 
CGCGGTGTCA 
GACTTTTCAA 
TCACTTTATT 
TAAAGGAAAG 
ACCCACGAGG 
TTGATGTGAT 
TTCCATTGCC 
TACAAATGCC 
GGTCCCAAAG 
ACGTCTTCAA 
TCCCACTATC 
ACACGGGGGA 



GATATCACCG 
GTGGTGGCCA 
ACTGGACAAG 
GAAGGTTATC 
CCGCTTCGCG 
AAACCGTTCT 
TTCGATAACG 
TACCGTACCT 
ATCGTGGTGA 
GAAGCGGGCA 
CAAGCGCACT 
GTGATGTGGA 
CCACTGGCGG 
ATGTTCTGCG 
CGTTATTACG 
AAAGAACTTC 
GTGGATACGT 
TGTGCATGGC 
CAGGTATGGA 
AAGAAAGGGA 
CGCTGGACTG 
AACTCTCCTG 
CCCGATCGTT 
GCGATGATTA 
TGCATGACGT 
TACGCGATAG 
TCTATGTTAC 
CAAAGGGTAA 
GTGAAGATAG 
GCCATCGTTG 
AGCATCGTGG 
CATCGATGGA 
CAGCTATCTG 
ATCATTGCGA 
ATGGACCCCC 
AGCAAGTGGA 
CTTCGCAAGA 
CTCTAGAGGA 



2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 
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TCCAGCTGAA 


GGCTCGACAA 


GGCAGTCCAC 


GGAGGAGCTG 


ATATTTGGTG 


GACAAGCTGT 


4500_ 


GGATAGGAGC 


AACCCTATCC 


CTAATATACC 


AGCACCACCA 


AGTCAGGGCA 


ATCCCCAGAT 


4560 


CAAGTGCAAA 


GGTCCGCCTT 


GTTTCTCCTC 


TGTCTCTTGA 


TCTGACTAAT 


CTTGGTTTAT 


4620 


GATTCGTTGA 


GTAATTTTGG 


GGAAAGCTCC 


TTTGCTGCTC 


CACACATGTC 


CATTCGAATT 


4680 


TTACCGTGTT 


TAGCAAGGGC 


GAAAAGTTTG 


CATCTTGATG 


ATTTAGCTTG 


ACTATGCGAT 


4740 


TGCTTTCCTG 


GACCCGTGCA 


GCTGCGCTCG 


GATCTGGGGC 


CATTTGTTCC 


AGGCACGGGA 


4800 


TAAGCATTCA 


GCCATGGCAG 


ACGCCAAAAA 


CATAAAGAAA 


GGCCCGGCGC 


CATTCTATCC 


4860 


TCTAGAGGAT 


GGAACCGCTG 


GAGAGCAACT 


GCATAAGGCT 


ATGAAGAGAT 


ACGCCCTGGT 


4920 


TCCTGGAACA 


ATTGCTTTTA 


CAGATGCACA 


TATCGAGGTG 


AACATCACGT 


ACGCGGAATA 


4980 


CTTCGAAATG 


TCCGTTCGGT 


TGGCAGAAGC 


TATGAAACGA 


TATGGGCTGA 


ATACAAATCA 


5040 


CAGAATCGTC 


GTATGCAGTG 


AAAACTCTCT 


TCAATTCTTT 


ATGCCGGTGT 


TGGGCGCGTT 


5100 


ATTTATCGGA 


GTTGCAGTTG 


CGCCCGCGAA 


CGACATTTAT 


AATGAACGTG 


AATTGCTCAA 


5160 


CAGTATGAAC 


ATTTCGCAGC 


CTACCGTAGT 


GTTTGTTTCC 


AAAAAGGGGT 


TGCAAAAAAT 


5220 


TTTGAACGTG 


CAAAAAAAAT 


TACCAATAAT 


CCAGAAAATT 


ATTATCATGG 


ATTCTAAAAC 


5280 


GGATTACCAG 


GGATTTCAGT 


CGATGTACAC 


GTTCGTCACA 


TCTCATCTAC 


CTCCCGGTTT 


5340 


TAATGAATAC 


GATTTTGTAC 


CAGAGTCCTT 


TGATCGTGAC 


AAAACAATTG 


CACTGATAAT 


5400 


GAATTCCTCT 


GGATCTACTG 


GGTTACCTAA 


GGGTGTGGCC 


CTTCCGCATA 


GAACTGCCTG 


5460 


CGTCAGATTC 


TCGCATGCCA 


GAGATCCTAT 


TTTTGGCAAT 


CAAATCATTC 


CGGATACTGC 


5520 


GATTTTAAGT 


GTTGTTCCAT 


TCCATCACGG 


TTTTGGAATG 


TTTACTACAC 


TCGGATATTT 


5580 


GATATGTGGA 


TTTCGAGTCG 


TCTTAATGTA 


TAGATTTGAA 


GAAGAGCTGT 


TTTTACGATC 


5640 


CCTTCAGGAT 


TACAAAATTC 


AAAGTGCGTT 


GCTAGTACCA 


ACCCTATTTT 


CATTCTTCGC 


5700 


CAAAAGCACT 


CTGATTGACA 


AATACGATTT 


ATCTAATTTA 


CACGAAATTG 


CTTCTGGGGG 


5760 


CGCACCTCTT 


TCGAAAGAAG 


TCGGGGAAGC 


GGTTGCAAAA 


CGCTTCCATC 


TTCCAGGGAT 


5820 


ACGACAAGGA 


TATGGGCTCA 


CTGAGACTAC 


ATCAGCTATT 


CTGATTACAC 


CCGAGGGGGA 


5880 


TGATAAACCG 


GGCGCGGTCG 


GTAAAGTTGT 


TCCATTTTTT 


GAAGCGAAGG 


TTGTGGATCT 


5940 


GGATACCGGG 


AAAACGCTGG 


GCGTTAATCA 


GAGAGGCGAA 


TTATGTGTCA 


GAGGACCTAT 


6000 


GATTATGTCC 


GGTTATGTAA 


ACAATCCGGA 


AGCGACCAAC 


GCCTTGATTG 


ACAAGGATGG 


6060 


ATGGCTACAT 


TCTGGAGACA 


TAGCTTACTG 


GGACGAAGAC 


GAACACTTCT 


TCATAGTTGA 


6120 


CCGCTTGAAG 


TCTTTAATTA 


AATACAAAGG 


ATATCAGGTG 


GCCCCCGCTG 


AATTGGAATC 


6180 


GATATTGTTA 


CAACACCCCA 


ACATCTTCGA 


CGCGGGCGTG 


GCAGGTCTTC 


CCGACGATGA 


6240 


CGCCGGTGAA 


CTTCCCGCCG 


CCGTTGTTGT 


TTTGGAGCAC 


GGAAAGACGA 


TGACGGAAAA 


6300 


AGAGATCGTG 


GATTACGTCG 


CCAGTCAAGT 


AACAACCGCG 


AAAAAGTTGC 


GCGGAGGAGT 


6360 


TGTGTTTGTG 


GACGAAGTAC 


CGAAAGGTCT 


TACCGGAAAA 


CTCGACGCAA 


GAAAAATCAG 


6420 


AGAGATCCTC 


ATAAAGGCCA 


AGAAGGGCGG 


AAAGTCCAAA 


TTGTAAAATG 


TAACTGTATT 


6480 


CAGCGATGAC 


GAAATTCTTA 


GCTATTGTAA 


TCAGATCCGC 


GAATTTCCCC 


GATCGTTCAA 


6540 


ACATTTGGCA 


ATAAAGTTTC 


TTAAGATTGA 


ATCCTGTTGC 


CGGTCTTGCG 


ATGATTATCA 


6600 


TATAATTTCT 


GTTGAATTAC 


GTTAAGCATG 


TAATAATTAA 


CATGTAATGC 


ATGACGTTAT 


6660 


TTATGAGATG 


GGTTTTTATG 


ATTAGAGTCC 


CGCAATTATA 


CATTTAATAC 


GCGATAGAAA 


6720 
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ACAAAATATA 
ATCGATCGGG 
ACAATGGACG 
GATACCATGT 
AACCCACAGA 
AATAATCTCC 
AGGACTAACT 
CCAGTATGGA 
TCTAAAAAGG 
CTAACAGAAC 
AAAGGGTAAT 
TGAAGATAGT 
CCATCGTTGA 
GCATCGTGGA 
TCTCCACTGA 
TATAAGGAAG 
ATTGAACAAG 
TATGACTGGG 
CAGGGGCGCC 
GACGAGGCAG 
GACGTTGTCA 
CTCCTGTCAT 
CGGCTGCATA 
GAGCGAGCAC 
CATCAGGGGC 
GAGGATCTCG 
CGCTTTTCTG 
GCGTTGGCTA 
GTGCTTTACG 
GAGTTCTTCT 
CATCACGAGA 
TCCGGGACGC 
ACCCCAACAG 
ATTCAGATTG 
ATGTCCTCAG 
ATACCGTGCT 
ATATTTGACG 
GATCTTTATG 



GCGCGCAAAC 
AATTGAGATC 
ACTTCCTCTA 
TCACCACTGA 
TGGTTAGAGA 
AGGAGATCAA 
GCATCAAGAA 
CGATTCAAGG 
TAGTTCCCAC 
TCGCCGTAAA 
ATCCGGAAAC 
GGAAAAGGAA 
AGATGCCTCT 
AAAAGAAGAC 
CGTAAGGGAT 
TTCATTTCAT 
ATGGATTGCA 
CACAACAGAC 
CGGTTCTTTT 
CGCGGCTATC 
CTGAAGCGGG 
CTCACCTTGC 
CGCTTGATCC 
GTACTCGGAT 
TCGCGCCAGC 
TCGTGACCCA 
GATTCATCGA 
CCCGTGATAT 
GTATCGCCGC 
GAGCGGGACT 
TTTCGATTCC 
CGGCTGGATG 
AGGTGGATGG 
GGATGGGATT 
CGTCGAGCCC 
TGTAACTGAG 
CATTTATTAG 
TAATTCGTTA 



TAGGATAAAT 
TCATATGTCG 
TCTCTACGAT 
TAATGAGAAG 
GGCTTACGCA 
ATACCTTCCC 
CACAGAGAAA 
CTTGCTTCAC 
TGAATCAAAG 
GACTGGCGAA 
CTCCTCGGAT 
GGTGGCTCCT 
GCCGACAGTG 
GTTCCAACCA 
GACGCACAAT 
TTGGAGAGGA 
CGCAGGTTCT 
AATCGGCTGC 
TGTCAAGACC 
GTGGCTGGCC 
AAGGGACTGG 
TCCTGCCGAG 
GGCTACCTGC 
GGAAGCCGGT 
CGAACTGTTC 
TGGCGATGCC 
CTGTGGCCGG 
TGCTGAAGAG 
TCCCGATTCG 
CTGGGGTTCG 
ACCGCCGCCT 
ATCCTCCAGC 
ACAGACCCGT 
GAGCTTAAAG 
GGCATCTATG 
ACCGGATATG 
TATGTGTTAA 
CAATTAATAA 



TATCGCGCGC 
AGCTCGGGGA 
CTAGTCAGGA 
ATTAGCCTTT 
GCAGGTCTCA 
AAGAAGGTTA 
GATATATTTC 
AAACCAAGGC 
GCCATGGAGT 
CAGTTCCATC 
TCCATTGCCC 
ACAAATGCCA 
GTCCCAAAGA 
CGTCTTCAAA 
CCCACTATCC 
CACGCTGACA 
CCGGCCGCTT 
TCTGATGCCG 
GACCTGTCCG 
ACGACGGGCG 
CTGCTATTGG 
AAAGTATCCA 
CCATTCGACC 
CTTGTCGATC 
GCCAGGCTCA 
TGCTTGCCGA 
CTGGGTGTGG 
CTTGGCGGCG 
CAGCGCATCG 
AAATGACCGA 
TCTATGAAAG 
GCGGGGATCT 
TCTTACACCG 
CCGGCGCTGA 
TCGAGGGCAT 
AGGCCCTCAC 
TTTTCATTTG 
ATATTCAAAT 



GGTGTCATCT 
TCTCCTTTGC 
AGTTCGACGG 
TCAATTTCAG 
TCAAGACGAT 
AAGATGCAGT 
TCAAGATCAG 
AAGTAATAGA 
CAAAGATTCA 
GATGATTGAG 
AGCTATCTGT 
TCATTGCGAT 
TGGACCCCCA 
GCAAGTGGAT 
TTCGCAAGAC 
AGCTCGGATC 
GGGTGGAGAG 
CCGTGTTCCG 
GTGCCCTGAA 
TTCCTTGCGC 
GCGAAGTGCC 
TCATGGCTGA 
ACCAAGCGAA 
AGGATGATCT 
AGGCGCGCAT 
ATATCATGGT 
CGGACCGCTA 
AATGGGCTGA 
CCTTCTATCG 
CCAAGCGACG 
GTTGGGCTTC 
CATGCTGGAG 
GACTGGGCGC 
GACCATGCTC 
TGGTGGAGCG 
TCCGCTTGAT 
CAGTGCAGTA 
CAGATTATTG 



ATGTTACTAG 
CCCAGAGATC 
AGAAGGTGAC 
AAAGAATGCT 
CTACCCGAGC 
CAAAAGATTC 
AAGTACTATT 
GATTGGAGTC 
AATAGAGGAC 
ACTTTTCAAC 
CACTTTATTG 
AAAGGAAAGG 
CCCACGAGGA 
TGATGTGATA 
CCTTCCTCTA 
CTTTAGCATG 
GCTATTCGGC 
GCTGTCAGCG 
TGAACTGCAG 
AGCTGTGCTC 
GGGGCAGGAT 
TGCAATGCGG 
ACATCGCATC 
GGACGAAGAG 
GCCCGACGGC 
GGAAAATGGC 
TCAGGACATA 
CCGCTTCCTC 
CCTTCTTGAC 
CCCAACCTGC 
GGAATCGTTT 
TTCTTCGCCC 
GGGATAGGAT 
AAGGTAGGCA 
CGCTTCGGGG 
CTTGGCAAAG 
TTTTCTATTC 
ACTGTCATTT 



6780 _ 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 
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7500 
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7860 

7920 
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GTATCAAATC 


GTGTTTAATG 


GATATTTTTA 


TTATAATATT 


GATGATATCT 


CAATCAAAAC 


9060 


GTAGATAATA 


ATAATATTTA 


TTTAATATTT 


TTGCGTCGCA 


CAGTGAAAAT 


CTATATGAGA 


9120 


TTACAAAATA 


CCGACAACAT 


TATTTAAGAT 


ACATAGACAT 


TAACCCTGAG 


ACTGTTGGAC 


9180 


ATCAACGGGT 


AGATTCCTTC 


ATGCATAGCA 


CCTCATTCTT 


GGGGACAAAA 


GCACGGTTTG 


9240 


GCCGTTCCAT 


TGCTGCACGA 


ACGAGCTTTG 


CTATATCCTC 


GGGTTGGATC 


ATCTCATCAG 


9300 


GTCCAATCAA 


ATTTGTCCAA 


GAACTCATGT 


TAGTCGCAAC 


GAAACCGGGG 


CATATGGTGC 


9360 


ACTCTCAGTA 


CAATCTGCTC 


TGATGCCGCA 


TAGTTAAGCC 


AGCCCCGACA 


CCCGCCAACA 


9420 


CCCGCTGACG 


CGCCCTGACG 


GGCTTGTCTG 


CTCCCGGCAT 


CCGCTTACAG 


ACAAGCTGTG 


9480 


ACCGTCTCCG 


GGAGCTGCAT 


GTGTCAGAGG 


TTTTCACCGT 


CATCACCGAA 


ACGCGCGAGA 


9540 


CGAAAGGGCC 


TCGTGATACG 


CCTATTTTTA TAGGTTAATG -TCATGATAAT AATGGTTTCT 


9600 


TAGACGTCAG 


GTGGCACTTT 


TCGGGGAAAT 


GTGCGCGGAA 


CCCCTATTTG 


TTTATTTTTC 


9660 


TAAATACATT 


CAAATATGTA 


TCCGCTCATG 


AGACAATAAC 


CCTGATAAAT 


GCTTCAATAA 


9720 


TATTGAAAAA 


GGAAGAGTAT 


GAGTATTCAA 


CATTTCCGTG 


TCGCCCTTAT 


TCCCTTTTTT 


9780 


GCGGCATTTT 


GCCTTCCTGT 


TTTTGCTCAC 


CCAGAAACGC 


TGGTGAAAGT 


AAAAGATGCT 


9840 


GAAGATCAGT 


TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 


CGGTAAGATC 


9900 


CTTGAGAGTT 


TTCGCCCCGA AGAACGTTTT 


CCAATGATGA 


GCACTTTTAA 


AGTTCTGCTA 


9960 


TGTGGCGCGG 


TATTATCCCG TATTGACGCC GGGCAAGAGC AACTCGGTCG 


CCGCATACAC 


10020 


TATTCTCAGA 


ATGACTTGGT 


TGAGTACTCA 


CCAGTCACAG 


AAAAGCATCT 


TACGGATGGC 


10080 


ATGACAGTAA 


GAGAATTATG 


CAGTGCTGCC ATAACCATGA GTGATAACAC 


TGCGGCCAAC 


10140 


TTACTTCTGA 


CAACGATCGG 


AGGACCGAAG 


GAGCTAACCG 


CTTTTTTGCA 


CAACATGGGG 


10200 


GATCATGTAA 


CTCGCCTTGA 


TCGTTGGGAA 


CCGGAGCTGA 


ATGAAGCCAT 


ACCAAACGAC 


10260 


GAGCGTGACA 


CCACGATGCC 


TGTAGCAATG 


GCAACAACGT 


TGCGCAAACT 


ATTAACTGGC 


10320 


GAACTACTTA 


CTCTAGCTTC 


CCGGCAACAA 


TTAATAGACT 


GGATGGAGGC 


GGATAAAGTT 


10380 


GCAGGACCAC 


TTCTGCGCTC 


GGCCCTTCCG 


GCTGGCTGGT 


TTATTGCTGA 


TAAATCTGGA 


10440 


GCCGGTGAGC 


GTGGGTCTCG 


CGGTATCATT 


GCAGCACTGG 


GGCCAGATGG 


TAAGCCCTCC 


10500 


CGTATCGTAG 


TTATCTACAC 


GACGGGGAGT 


CAGGCAACTA 


TGGATGAACG 


AAATAGACAG 


10560 


ATCGCTGAGA 


TAGGTGCCTC 


ACTGATTAAG 


CATTGGTAAC 


TGTCAGACCA 


AGTTTACTCA 


10620 


TATATACTTT 


AGATTGATTT 


AAAACTTCAT 


TTTTAATTTA 


AAAGGATCTA 


GGTGAAGATC 


10680 


CTTTTTGATA 


ATCTCATGAC 


CAAAATCCCT 


TAACGTGAGT 


TTTCGTTCCA 


CTGAGCGTCA 


10740 


GACCCCGTAG 


AAAAGATCAA 


AGGATCTTCT 


TGAGATCCTT 


TTTTTCTGCG 


CGTAATCTGC 


10800 


TGCTTGCAAA 


CAAAAAAACC 


ACCGCTACCA 


GCGGTGGTTT 


GTTTGCCGGA 


TCAAGAGCTA 


10860 


CCAACTCTTT 


TTCCGAAGGT 


AACTGGCTTC 


. AGCAGAGCGC 


AGATACCAAA 


TACTGTCCTT 


10920 


CTAGTGTAGC 


CGTAGTTAGG 


CCACCACTTC AAGAACTCTG TAGCACCGCC 


TACATACCTC 


10980 


GCTCTGCTAA 


TCCTGTTACC 


AGTGGCTGCT 


GCCAGTGGCG 


ATAAGTCGTG 


TCTTACCGGG 


11040 


TTGGACTCAA 


GACGATAGTT 


ACCGGATAAG 


GCGCAGCGGT 


CGGGCTGAAC 


GGGGGGTTCG 


11100 


TGCACACAGC 


CCAGCTTGGA 


GCGAACGACC 


TACACCGAAC 


TGAGATACCT 


ACAGCGTGAG 


11160 


CATTGAGAAA 


GCGCCACGCT 


TCCCGAAGGG 


AGAAAGGCGG 


ACAGGTATCC 


GGTAAGCGGC 


11220 


AGGGTCGGAA 


CAGGAGAGCG 


CACGAGGGAG 


CTTCCAGGGG 


GAAACGCCTG 


GTATCTTTAT 


11280 
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AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT 
GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT 
TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG 
ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA 
GTGAGCGAGG AAGCGGAAGA GCGCCCAATA CGCAAACCGC 
ATTCATTAAT GCAGCTGGCA CGACAGGTTT CCCGACTGGA 
GCAATTAATG TGAGTTAGCT CACTCATTAG GCACCCCAGG 
GCTCGTATGT TGTGTGGAAT TGTGAGCGGA TAACAATTTC 
CATGATTACG CCAAGCTTCC GCGG 
(2) INFORMATION FOR SEQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11991 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
GGGCCCACCA CTGTTGTAAC TTGTAAGCCA CTAGCTCACG 
CTGCTGTTTC TTCCTCTGCT AACTGCGTTA TGATATGACG 
TACTTCCTTA TTTTCAGCAT GGCCTCTTTT ATGTTTATTT 
CTCGATGTTT CCTTCAAGAA ACGGCCACTC ACTATGTGGT 
GCAGCTCCTA CAGGTACCAG TAGTCATGTC AGTGTGGAAG 
TCGAGGAACC TGGTCGTGCT GACATGAATG TAGGCCATGC 
AATCATCACG ACGCGCCGTG TACTGGGCGT TGGTACATCA 
CGGAAGCATG CGTGTGTGTT GGCTGCAGGA CCGGCTATAG 
GAAGCCAGTC ATGTTAGGCA CTCACGCGCT CCTGCCGTTT 
GTATTGATCA CTAGTTCACT ACGCTGATAT AGCAAATTTT 
GAGCGATAAA TCTTAGACGT TACCTATCCA TATGAAGCTT 
GCTGTAGCAT CATTCGTATA CACTTTTGTC CCCAAAGACA 
ACAGAACCCT CCCTTCCCTG CAGATAACGA CACTTAAGTA 
TTTCAGAAGC AAAATCTCAC TTTTCGCTGG CCTTTTTGTA 
ACAGTGTATG CTATATTGTC ATGTGCTGCG TAAGGTTTAA 
CAGTATATCA CTACTTTGTT ATGGGTGGGG CCTAGCACAA 
AGTTAGAACG ATGACTGATC TACTGTAAAG CGACACCTGT 
CCATTCCTGG ACGACTCCAG ATCCAGGATA TGATGCTGTT 
ATAAAATTGC ATGATGTTCT TCTACTCTTT AGGCAGTTTT 
AATGCATGTG CATATATGAG CAGCATAATC ATCAATTAAT 
TTCACTCCTT CACATTATTC CAGCCCTTGA AGAAAAATGT 



TTTTGTGATG 
TACGGTTCCT 
ATTCTGTGGA 
CGACCGAGCG 
CTCTCCCCGC 
AAGCGGGCAG 
CTTTACACTT 
ACACAGGAAA 



CTCGTCAGGG 
GGCCTTTTGC 
TAACCGTATT 
CAGCGAGTCA 
GCGTTGGCCG 
TGAGCGCAAC 
TATGCTTCCG 
CAGCTATGAC 



TTCTCCATGA 
TCGTATAAAT 
AACAGTAGCA 
GTGCAGAAGA 
CTTTCCAACC 
AAGCACAAGC 
CACCCCGCGT 
GTTTCCTGCA 
GATGAATCAT 
AAGATGTGAA 
GTGCGAAAAA 
GGGATACGAA 
TAACAAAAGT 
CTTTGGTTAC 
ATATGGTTCG 
ACTTGATACA 
CCTGTTATGG 
ACATAATGCG 
GTTCAACAGG 
CATAGGTTCG 
AGCAGTGCTT 



11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11784 



GCTCTTCTCT 
AATCTCACAA 
ACCAACGCCG 
ACAAATGTAA 
AACGCCTCCT 
ACCTAACGCG 
TTGACCTGAT 
TTGGACAGCA 
CCGGTCTTTC 
ACCACGAGAC 
AAGGCGTGCC 
TCCATGCTCG 
AGTTGGATTA 
TTGAGTTCAG 
ACAAATATAT 
GCTAGGATAA 
TAGTTTAAGT 
ATTGTTCACA 
CAAGTTGCAT 
TCATTTTAGT 
GCTGTTTAAT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
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AAGTGGCAGA 
ACTTTGAGCT 
CTTGATGGGA 
TCACCCTAAC 
CTCTCCTTTC 
CATCTTATGA 
ATAACATAGT 
TCTTGATCTG 
CTGCTCCACA 
TTGATGATTT 
CCATGGTCCG 
CATTCAGTCT 
AAGAAAGCCG 
TTCGTAATTA 
CAGGCCAGCG 
ATAATCAGGA 
CGTATGTTAT 
GGCAGACTAT 
ACTTCCATGA 
CGAACACCTG 
CGTCTGTTGA 
ATCAACAGGT 
ACCTCTGGCA 
CAGAGTGTGA 
AGTTCCTGAT 
ACTTACGTGG 
GGATTGGGGC 
GGGCAGATGA 
CTTTAGGCAT 
TCAACGGGGA 
AAAACCACCC 
TGCACGGGAA 
TCACCTGCGT 
ATGTGCTGTG 
CAGAGAAGGT 
TCATCACCGA 
GGAGTGAAGA 
GCGCCGTCGT 



GCTGTTTTCA 
AAAACTGAAG 
TTACTATAGC 
AACCCGGCTG 
CTTGCATGCA 
AACCATCCAC 
ACAGCGAAGG 
ACTAATCTTG 
CATGTCCATT 
AGCTTGACTA 
TCCTGTAGAA 
GGATCGCGAA 
GGCAATTGCT 
TGCGGGCAAC 
TATCGTGCTG 
AGTGATGGAG 
TGCCGGGAAA 
CCCGCCGGGA 
TTTCTTTAAC 
GGTGGACGAT 
CTGGCAGGTG 
GGTTGCAACT 
ACCGGGTGAA 
TATCTACCCG 
TAACCACAAA 
CAAAGGATTC 
CAACTCCTAC 
ACATGGCATC 
TGGTTTCGAA 
AACTCAGCAA 
AAGCGTGGTG 
TATTTCGCCA 
CAATGTAATG 
CCTGAACCGT 
ACTGGAAAAA 
ATACGGCGTG 
GTATCAGTGT 
CGGTGAACAG 



CTCCACCTAC 
CACCAAACCG 
AGTTGCTACA 
ACTGCTGCAT 
CTACACCCAC 
AAGAGGAGAA 
TAACTCACAG 
GTTTATGATT 
CGAATTTTAC 
TGCGATTGCT 
ACCCCAACCC 
AACTGTGGAA 
GTGCCAGGCA 
GTCTGGTATC 
CGTTTCGATG 
CATCAGGGCG 
AGTGTACGTA 
■ ATGGTGATTA 
TATGCCGGAA 
ATCACCGTGG 
GTGGCCAATG 
GGACAAGGCA 
GGTTATCTCT 
CTTCGCGTCG 
CCGTTCTACT 
GATAACGTGC 
CGTACCTCGC 
GTGGTGATTG 
GCGGGCAACA 
GCGCACTTAC 
ATGTGGAGTA 
CTGGCGGAAG 
TTCTGCGACG 
TATTACGGAT 
GAACTTCTGG 
GATACGTTAG 
GCATGGCTGG 
GTATGGAATT 



GCTTGTCTAG 
CTACAAAAGA 
GTTCTAGCTA 
CTGACCCCAC 
TTCCTGCAGC 
GAAACAATCA 
TGCAAAGGTC 
CGTTGAGTAA 
CGTGTTTAGC 
TTCCTGGACC 
GTGAAATCAA 
TTGATCAGCG 
GTTTTAACGA 
AGCGCGAAGT 
CGGTCACTCA 
GCTATACGCC 
TCACCGTTTG 
CCGACGAAAA 
TCCATCGCAG 
TGACGCATGT 
GTGATGTCAG 
CTAGCGGGAC 
ATGAACTGTG 
GCATCCGGTC 
TTACTGGCTT 
TGATGGTGCA 
ATTACCCTTA 
ATGAAACTGC 
AGCCGAAAGA 
AGGCGATTAA 
TTGCCAACGA 
CAACGCGTAA 
CTCACACCGA 
GGTATGTCCA 
CCTGGCAGGA 
CCGGGCTGCA 
ATATGTATCA 
TCGCCGATTT 



GACCAAAATT 
ACGTAGGAGC 
GCTACCTTAT 
CGTCCCCTGC 
TATATATACC 
ACCAGCAACA 
CGCCTTGTTT 
TTTTGGGGAA 
AAGGGCGAAA 
CGTGCAGCTG 
AAAACTCGAC 
TTGGTGGGAA 
TCAGTTCGCC 
CTTTATACCG 
TTACGGCAAA 
ATTTGAAGCC 
TGTGAACAAC 
CGGCAAGAAA 
CGTAATGCTC 
CGCGCAAGAC 
CGTTGAACTG 
TTTGCAAGTG 
CGTCACAGCC 
AGTGGCAGTG 
TGGTCGTCAT 
CGACCACGCA 
CGCTGAAGAG 
TGCTGTCGGC 
ACTGTACAGC 
AGAGCTGATA 
ACCGGATACC 
ACTCGACCCG 
TACCATCAGC 
AAGCGGCGAT 
GAAACTGCAT 
CTCAATGTAC 
CCGCGTCTTT 
TGCGACCTCG 



TTAATCTGTC 
TGAATTGTAA 
TCTATACGCA 
TCCAAACCAA 
ACCATATGCC 
CTCTTCTCTT 
CTCCTCTGTC 
AGCTCCTTTG 
AGTTTGCATC 
CGCTCGTCGA 
GGCCTGTGGG 
AGCGCGTTAC 
GATGCAGATA 
AAAGGTTGGG 
GTGTGGGTCA 
GATGTCACGC 
GAACTGAACT 
AAGCAGTCTT 
TACACCACGC 
TGTAACCACG 
CGTGATGCGG 
GTGAATCCGC 
AAAAGCCAGA 
AAGGGCGAAC 
GAAGATGCGG 
TTAATGGACT 
ATGCTCGACT 
TTTAACCTCT 
GAAGAGGCAG 
GCGCGTGACA 
CGTCCGCAAG 
ACGCGTCCGA 
GATCTCTTTG 
TTGGAAACGG 
CAGCCGATTA 
ACCGACATGT 
GATCGCGTCA 
CAAGGCATAT 



1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 
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TGCGCGTTGG 
CTTTTCTGCT 
GCAAACAATG 
ATTGGAGCTC 
ATCCTGTTGC 
TAATAATTAA 
CGCAATTATA 
TATCGCGCGC 
GGGTGGAGAC 
CTATCTGTCA 
ATTGCGATAA 
GACCCCCACC 
AAGTGGATTG 
CCTCGGATTC 
TGGCTCCTAC 
CGACAGTGGT 
TCCAACCACG 
CGCACAATCC 
GGAGAGAACA 
GGAGCTGATA 
ACCACCAAGT 
CTCTTGATCT 
GCTGCTCCAC 
CTTGATGATT 
CTGGGGCCAT 
. AAAGAAAGGC 
TAAGGCTATG 
CGAGGTGAAC 
GAAACGATAT 
ATTCTTTATG 
CATTTATAAT 
TGTTTCCAAA 
GAAAATTATT 
CGTCACATCT 
TCGTGACAAA 
TGTGGCCCTT 
TGGCAATCAA 
TGGAATGTTT 



CGGTAACAAG 
GCAAAAACGC 
AATCAACAAC 
GAATTTCCCC 
CGGTCTTGCG 
CATGTAATGC 
CATTTAATAC 
GGTGTCATCT 
TTTTCAACAA 
CTTTATTGTG 
AGGAAAGGCC 
CACGAGGAGC 
ATGTGATCAT 
CATTGCCCAG 
AAATGCCATC 
CCCAAAGATG 
TCTTCAAAGC 
CACTATCCTT 
CGGGGGACTC 
TTTGGTGGAC 
CAGGGCAATC 
GACTAATCTT 
ACATGTCCAT 
TAGCTTGACT 
TTGTTCCAGG 
CCGGCGCCAT 
AAGAGATACG 
ATCACGTACG 
GGGCTGAATA 
CCGGTGTTGG 
GAACGTGAAT 
AAGGGGTTGC 
ATCATGGATT 
CATCTACCTC 
ACAATTGCAC 
CCGCATAGAA 
ATCATTCCGG 
ACTACACTCG 



AAAGGGATCT 
TGGACTGGCA 
TCTCCTGGCG 
GATCGTTCAA 
ATGATTATCA 
ATGACGTTAT 
GCGATAGAAA 
ATGTTACTAG 
AGGGTAATAT 
AAGATAGTGG 
ATCGTTGAAG 
ATCGTGGAAA 
CGATGGAGAC 
CTATCTGTCA 
ATTGCGATAA 
GACCCCCACC 
AAGTGGATTG 
CGCAAGACCC 
TAGAGGATCC 
AAGCTGTGGA 
CCCAGATCAA 
GGTTTATGAT 
TCGAATTTTA 
ATGCGATTGC 
CACGGGATAA 
TCTATCCTCT 
CCCTGGTTCC 
CGGAATACTT 
CAAATCACAG 
GCGCGTTATT 
TGCTCAACAG 
AAAAAATTTT 
CTAAAACGGA 
CCGGTTTTAA 
TGATAATGAA 
CTGCCTGCGT 
ATACTGCGAT 
GATATTTGAT 



TCACTCGCGA 
TGAACTTCGG 
CACCATCGTC 
ACATTTGGCA 
TATAATTTCT 
TTATGAGATG 
ACAAAATATA 
ATCGATCGGG 
CCGGAAACCT 
AAAAGGAAGG 
ATGCCTCTGC 
AAGAAGACGT 
TTTTCAACAA 
CTTTATTGTG 
AGGAAAGGCC 
CACGAGGAGC 
ATGTGATATC 
TTCCTCTATA 
AGCTGAAGGC 
TAGGAGCAAC 
GTGCAAAGGT 
TCGTTGAGTA 
CCGTGTTTAG 
TTTCCTGGAC 
GCATTCAGCC 
AGAGGATGGA 
TGGAACAATT 
CGAAATGTCC 
AATCGTCGTA 
TATCGGAGTT 
TATGAACATT 
GAACGTGCAA 
TTACCAGGGA 
TGAATACGAT 
TTCCTCTGGA 
CAGATTCTCG 
TTTAAGTGTT 
ATGTGGATTT 



CCGCAAACCG 
TGAAAAACCG 
GGCTACAGCC 
ATAAAGTTTC 
GTTGAATTAC 
GGTTTTTATG 
GCGCGCAAAC 
AATTAAGCTT 
CCTCGGATTC 
TGGCTCCTAC 
CGACAGTGGT 
TCCAACCACG 
AGGGTAATAT 
AAGATAGTGG 
ATCGTTGAAG 
ATCGTGGAAA 
TCCACTGACG 
TAAGGAAGTT 
TCGACAAGGC 
CCTATCCCTA 
CCGCCTTGTT 
ATTTTGGGGA 
CAAGGGCGAA 
CCGTGCAGCT 
ATGGCAGACG 
ACCGCTGGAG 
GCTTTTACAG 
GTTCGGTTGG 
TGCAGTGAAA 
GCAGTTGCGC 
TCGCAGCCTA 
AAAAAATTAC 
TTTCAGTCGA 
TTTGTACCAG 
TCTACTGGGT 
CATGCCAGAG 
GTTCCATTCC 
CGAGTCGTCT 



AAGTCGGCGG 
CAGCAGGGAG 
TCGGTGGGGA 
TTAAGATTGA 
GTTAAGCATG 
ATTAGAGTCC 
TAGGATAAAT 
AGATCTGCAT 
CATTGCCCAG 
AAATGCCATC 
CCCAAAGATG 
TCTTCAAAGC 
CCGGAAACCT 
AAAAGGAAGG 
ATGCCTCTGC 
AAGAAGACGT 
TAAGGGATGA 
CATTTCATTT 
AGTCCACGGA 
ATATACCAGC 
TCTCCTCTGT 
AAGCTCCTTT 
AAGTTTGCAT 
GCGCTCGGAT 
CCAAAAACAT 
AGCAACTGCA 
ATGCACATAT 
CAGAAGCTAT 
ACTCTCTTCA 
CCGCGAACGA 
CCGTAGTGTT 
CAATAATCCA 
TGTACACGTT 
AGTCCTTTGA 
TACCTAAGGG 
ATCCTATTTT 
ATCACGGTTT 
TAATGTATAG 
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4260 
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A T 1 '! " l 'P A A P A A 
AX 1 1 vjnnAjnn 


GAGPTGTTTT 


TACGATCCCT 


TCAGGATTAC 


AAAATTCAAA 


GTGCGTTGCT 


5880 


nVj x WnnL. V— 


PTATTTTCAT 

v» ini xxx v*rt x 


TCTTCGCCAA 


AAGCACTCTG 


ATTGACAAAT 


ACGATTTATC 


5940 


t 2i a r r r r r p a p a p 


GAAATTG CTT 


CTGGGGGCGC 


ACCTCTTTCG 


AAAGAAGTCG 


GGGAAGCGGT 


6000 


TGPAAAAPGC 


TTCCATCTTC 


CAGGGATACG 


ACAAGGATAT 


GGGCTCACTG 


AGACTACATC 


6060 


APPTATTPTG 


ATTAPAPPCG 


AGGGGGATGA 


TAAACCGGGC 


GCGGTCGGTA 


AAGTTGTTCC 


6120 


A TTTTTTH A A 
Hi 1 11 J. 1 UnM 


G PG A AGGTTG 


TGGATCTGGA 


TACCGGGAAA 


ACGCTGGGCG 


TTAATCAGAG 


6180 


AriPPP A ATTA 
nOOLVjnn 1 In 


TGTGTPAGAG 


GACCTATGAT 

VJaaN-^ N«» X X W** X 


TATGTCCGGT 


TATGTAAACA 


ATCCGGAAGC 


6240 


VjnV- LruiV^ UUL 


TTGATTGACA 

1 xvjnx iwnLn 


AGGATGGATG 


GCTACATTCT 


GGAGACATAG 


CTTACTGGGA 


6300 


r*PA APAPPA A 


PAPTTPTTCA 

1 1 V— X X V»n 


TAGTTGACCG 


CTTGAAGTCT 


TTAATTAAAT 


ACAAAGGATA 


6360 


1 LnVJVJ X UUt v_ 


PPPGPTG AAT 

V* V— i LOU 1 VJnn X 


TGGAATCGAT 


ATTGTTACAA 


CACCCCAACA 


TCTTCGACGC 


6420 


ppppgtggpa 


GGTPTTCCCG 

vj \j x \ i j. ^ — v» \j 


ACGATGACGC 


CGGTGAACTT 


CCCGCCGCCG 


TTGTTGTTTT 


6480 


nn AGPAPGGA 

VJVJ^Vw ^— ^VV— VJ VJ.n 


AAGACGATGA 


CGGAAAAAGA 


GATCGTGGAT 


TACGTCGCCA 


GTCAAGTAAC 


6540 


A A CCGCG AAA 

nn V— V— VJ VJnnn 


AAGTTGCGCG 


GAGGAGTTGT 


GTTTGTGGAC 


GAAGTACCGA 


AAGGTCTTAC 


6600 


PGGAAAAPTP 


GAPG CAAGAA 


AAATCAGAGA 


GATCCTCATA 


AAGGCCAAGA 


AGGGCGGAAA 


6660 


GTPPAAATTG 


T A A A ATGT AA 

X rVuui X vj -L z^n 


CTGTATTCAG 


CGATGACGAA 


ATTCTTAGCT 


ATTGTAATCA 


6720 


PATPPGPGAA 


TTTPPPPGAT 


CGTTCAAACA 


TTTGGCAATA 


AAGTTTCTTA 


AGATTGAATC 


6780 


ptpttgppgg 


TPTTPPGATG 


ATTATCATAT 


AATTTCTGTT 


GAATTACGTT 


AAGCATGTAA 


, 6840 


T 1 A 7\ ' 1 M 1 '7V A P AT 


GTAATGPATG 


AC GT T ATT T A 

,nv*vj x j. ^ 


TGAGATGGGT 


TTTTATGATT 


AGAGTCCCGC 


6900 


A ATT AT AP AT 


TTAATAPGCG 


ATAGAAAACA 


AAATATAGCG 


CGCAAACTAG 


GATAAATTAT 


:. 6960 


v- vjv«.vjv-.vjv-vjvj 1 


GTPATPTATG 


TTACTAGATC 

X X X 4*wix X V-# 


GATCGGGAAT 


TGAGATCTCA 


TATGTCGAGC 


7020 


1 UuOVJUn iLl 


PPTTTGPPPC 

\<U1 X X VJ V..N-. V— V— 


AGAGATCACA 


ATGGACGACT 


TCCTCTATCT 


CTACGATCTA 


. . 7080 


GTPAGGAAGT 
vj X UnVJVjnnVJ 1 


TPGAPGGAGA 


AGGTGACGAT 


ACCATGTTCA 


CCACTGATAA 


TGAGAAGATT 


' 7140 


nULU 1 X 1 X V»n 


ATTTP AG A A A 

n XXX UftOnrtft 


GAATGCTAAC 


CCACAGATGG 


TTAGAGAGGC 


TTACGCAGCA 


7200 


PPTPTPATP A 


AGAPGATPTA 


CCCGAGCAAT 


AATCTCCAGG 


AGATCAAATA 


CCTTCCCAAG 


' 7260 


A AnnTTA A AP 
nnVjVj 1 inHno 


ATGPAGTPAA 

n X VJV-nVJ X V^nn 


AAGATTCAGG 


ACTAACTGCA 


TCAAGAACAC 


AGAGAAAGAT 


7320 


7A T A TTTPTP A 
rtini 1 V»n 


AGATPAGAAG 

AOri 1 V-.nVjnnvj 


TACTATTCCA 

X x x x 


GTATGGACGA 


TTCAAGGCTT 


GCTTCACAAA 


7380 


PPA AGGPAAG 

V- V^nnVj OUnrtvj 


TAATAGAGAT 

1 fin X rtUrtwii x 


TGGAGTCTCT 

X wVJ*xw X W- X ^— • X 


AAAAAGGTAG 


TTCCCACTGA 


' ATCAAAGGCC 


7440 


ATGGAGTPAA 

n 1 VJVJnVJ 1 V»nn 


AGATTCAAAT 

nun x x v»nnrv j. 


AGAGGACCTA 


ACAGAACTCG 


CCGTAAAGAC 


TGGCGAACAG 


7500 


TTPPATCGAT 


GATTGAGACT 


TTTCAACAAA 


GGGTAATATC 


CGGAAACCTC 


CTCGGATTCC 


7560 


ATTGPPPAGC 


TATCTGTCAC 


TTTATTGTGA 


AGATAGTGGA 


AAAGGAAGGT 


GGCTCCTACA 


7620 


AATGCCATCA 


TTGCGATAAA 


GGAAAGGCCA 


TCGTTGAAGA 


TGCCTCTGCC 


GACAGTGGTC 


7680 


CCAAAGATGG 


ACCCCCACCC 


ACGAGGAGCA 


TCGTGGAAAA 


AGAAGACGTT 


CCAACCACGT 


7740 


PTTPAAAGCA 


AGTGGATTGA 

ix\J X WWi* X X wX* 


TGTGATATCT 


CCACTGACGT 


AAGGGATGAC 


GCACAATCCC 


7800 


APTATPPTTP 


GPAAGACCCT 


TCCTCTATAT 


AAGGAAGTTC 


ATTTCATTTG 


GAGAGGACAC 


7860 


GCTGACAAGC 


TCGGATCCTT 


TAGCATGATT 


CjAAC AACjA 1 w 


Vjnl 1 VjL.AU VJV^ 


AGGTTPTPPG 

nVjVj X X W 1 V^V^VJ 




GCCGCTTGGG 


TGGAGAGGCT 


ATTCGGCTAT 


GACTGGGCAC 


AACAGACAAT 


CGGCTGCTCT 


7980 


GATGCCGCCG 


TGTTCCGGCT 


GTCAGCGCAG 


GGGCGCCCGG 


TTCTTTTTGT 


CAAGACCGAC 


8040 


CTGTCCGGTG 


CCCTGAATGA 


ACTGCAGGAC 


GAGGCAGCGC 


GGCTATCGTG 


GCTGGCCACG 


8100 
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ACGGGCGTTC 
CTATTGGGCG 
GTATCCATCA 
TTCGACCACC 
GTCGATCAGG 
AGGCTCAAGG 
TTGCCGAATA 
GGTGTGGCGG 
GGCGGCGAAT 
CGCATCGCCT 
TGACCGACCA 
ATGAAAGGTT 
GGGATCTCAT 
TACACCGGAC 
GCGCTGAGAC 
AGGGCATTGG 
CCCTCACTCC 
TCATTTGCAG 
TTCAAATCAG 
TAATATTGAT 
CGTCGCACAG 
TAGACATTAA 
CATTCTTGGG 
TATCCTCGGG 
TCGCAACGAA 
TTAAGCCAGC 
CCGGCATCCG 
TCACCGTCAT 
GTTAATGTCA 
CGCGGAACCC 
CAATAACCCT 
TTCCGTGTCG 
GAAACGCTGG 
GAACTGGATC 
ATGATGAGCA 
CAAGAGCAAC 
GTCACAGAAA 
ACCATGAGTG 



CTTGCGCAGC 
AAGTGCCGGG 
TGGCTGATGC 
AAGCGAAACA 
ATGATCTGGA 
CGCGCATGCC 
TCATGGTGGA 
ACCGCTATCA 
GGGCTGACCG 
TCTATCGCCT 
AGCGACGCCC 
GGGCTTCGGA 
GCTGGAGTTC 
TGGGCGCGGG 
CATGCTCAAG 
TGGAGCGCGC 
GCTTGATCTT 
TGCAGTATTT 
ATTATTGACT 
GATATCTCAA 
TGAAAATCTA 
CCCTGAGACT 
GACAAAAGCA 
TTGGATCATC 
ACCGGGGCAT 
CCCGACACCC 
CTTACAGACA 
CACCGAAACG 
TGATAATAAT 
CTATTTGTTT 
GATAAATGCT 
CCCTTATTCC 
TGAAAGTAAA 
TCAACAGCGG 
CTTTTAAAGT 
TCGGTCGCCG 
AGCATCTTAC 
ATAACACTGC 



TGTGCTCGAC 
GCAGGATCTC 
AATGCGGCGG 
TCGCATCGAG 
CGAAGAGCAT 
CGACGGCGAG 
AAATGGCCGC 
GGACATAGCG 
CTTCCTCGTG 
TCTTGACGAG 
AACCTGCCAT 
ATCGTTTTCC 
TTCGCCCACC 
ATAGGATATT 
GTAGGCAATG 
TTCGGGGATA 
GGCAAAGATA 
TCTATTCGAT 
GTCATTTGTA 
TCAAAACGTA 
TATGAGATTA 
GTTGGACATC 
CGGTTTGGCC 
TCATCAGGTC 
ATGGTGCACT 
GCCAACACCC 
AGCTGTGACC 
CGCGAGACGA 
GGTTTCTTAG 
ATTTTTCTAA 
TCAATAATAT 
CTTTTTTGCG 
AGATGCTGAA 
TAAGATCCTT 
TCTGCTATGT 
CATACACTAT 
GGATGGCATG 
GGCCAACTTA 



GTTGTCACTG 
CTGTCATCTC 
CTGCATACGC 
CGAGCACGTA 
CAGGGGCTCG 
GATCTCGTCG 
TTTTCTGGAT 
TTGGCTACCC 
CTTTACGGTA 
TTCTTCTGAG 
CACGAGATTT 
GGGACGCCGG 
CCAACAGAGG 
CAGATTGGGA 
TCCTCAGCGT 
CCGTGCTTGT 
TTTGACGCAT 
CTTTATGTAA 
TCAAATCGTG 
GATAATAATA 
C AAAATAC CG 
AACGGGTAGA 
GTTCCATTGC 
CAATCAAATT 
CTCAGTACAA 
GCTGACGCGC 
GTCTCCGGGA 
AAGGGCCTCG 
ACGTCAGGTG 
ATACATTCAA 
TGAAAAAGGA 
GCATTTTGCC 
GATCAGTTGG 
GAGAGTTTTC 
GGCGCGGTAT 
TCTCAGAATG 
ACAGTAAGAG 
CTTCTGACAA 



AAGCGGGAAG 
ACCTTGCTCC 
TTGATCCGGC 
CTCGGATGGA 
CGCCAGCCGA 
TGACCCATGG 
TCATCGACTG 
GTGATATTGC 
TCGCCGCTCC 
CGGGACTCTG 
CGATTCCACC 
CTGGATGATC 
TGGATGGACA 
TGGGATTGAG 
CGAGCCCGGC 
AACTGAGACC 
TTATTAGTAT 
TTCGTTACAA 
TTTAATGGAT 
ATATTTATTT 
ACAACATTAT 
TTCCTTCATG 
TGCACGAACG 
TGTCCAAGAA 
TCTGCTCTGA 
CCTGACGGGC 
GCTGCATGTG 
TGATACGCCT 
GCACTTTTCG 
ATATGTATCC 
AGAGTATGAG 
TTCCTGTTTT 
GTGCACGAGT 
GCCCCGAAGA 
TATCCCGTAT 
ACTTGGTTGA 
AATTATGCAG 
CGATCGGAGG 



GGACTGGCTG 
TGCCGAGAAA 
TACCTGCCCA 
AGCCGGTCTT 
ACTGTTCGCC 
CGATGCCTGC 
TGGCCGGCTG 
TGAAGAGCTT 
CGATTCGCAG 
GGGTTCGAAA 
GCCGCCTTCT 
CTCCAGCGCG 
GACCCGTTCT 
CTTAAAGCCG 
ATCTATGTCG 
GGATATGAGG 
GTGTTAATTT 
TTAATAAATA 
ATTTTTATTA 
AATATTTTTG 
TTAAGATACA 
CATAGCACCT 
AGCTTTGCTA 
CTCATGTTAG 
TGCCGCATAG 
TTGTCTGCTC 
TCAGAGGTTT 
ATTTTTATAG 
GGGAAATGTG 
GCTCATGAGA 
TATTCAACAT 
TGCTCACCCA 
GGGTTACATC 
ACGTTTTCCA 
TGACGCCGGG 
GTACTCACCA 
TGCTGCCATA 
ACCGAAGGAG 



8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 
8880 
8940 
9000 
9060 
9120 
9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
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O XnnOOVjO X X 




PATGGGGGAT 

V.n X OOOOOn X 


PATGTAAPTP 
On x vj x rvriu x o 


GPPTTGATPG 

OO O X XOnXOO 


TTGGGAAPPG 

X X 000.Mjri.WV-0 


x KJ t ±.*± \J 


GAGPTGAATG 

UrtO O X Onn X O 


AAGPPATAPP 

nnUvV>n X /AO o 


AAAPGAPGAG 

/TJTXnO OnO OftO 


CGTGACACCA 

oo x urv-rt\- en 


CGATGPCTGT 

VnO/l X OVvV> X VJ X 


AGCAATGGPA 

nV7V.nn X OO Uft 




ACAAPGTTGP 


GGAAACTATT 


AACTGGCGAA 

rulu x \Jw w VJiW 


PTAPTTAPTP 

W X iiV* x x nv. X w 


TAGCTTCCCG 

in\jv> x x wwovj 


GCAACAATTA 

w v«nnv>nn X x r\ 


iw jdu 


ATAGACTGGA 


TGGAGGCGGA 


TAAAGTTGCA 

X /^^U^\J X X VJ V#* A* 


GGAPCACTTC 

ww.nw vnv x x w 


TGCGCTCGGC 

X W V* VJ V* X V* \J\J V* 


CCTTPCGGPT 

WW X X VtVrVWVo X 


10620 


GGPTGGTTTA 


TTGPTGATAA 

X X W W X VJ*1 X /l^V 


ATCTGGAGCC 

X W X W VJXT.W V» W 


GGTGAGCGTG 

vjw x unu w w x w 


GGTCTCGCGG 

WW X W X W WWVJW 


TATCATTG PA 

x x v<n x x vj v_n. 


10680 


GCACTGGGGC 


CAGATGGTAA 


GCCCTCCCGT 

VJ >— ► V-r X *W W X 


ATPGTAGTTA 

*»i x x— \j x r*w x x jPx 


TCTACACGAC 


GGGGAGTCAG 


10740 


GCAACTATGG 

VJ 4xT\V-» X ^* X VJVJ 


ATGAACGAAA 


TAGACAGATC 


GCTGAGATAG 


GTGCCTCACT 

VJ X VJ X v***. v»» X 


GATTAAGCAT 


10800 


TGGTAACTGT 

X w W X *V*V W X VJ X 


GAGACCAAGT 


TTAPTPATAT 

X X 4^ W X Wl X X 


ATAPTTTAGA 

**. x nv> xxx *ivr* 


TTGATTTAAA 

X X Wrl X X X amIxx 


APTTPATTTT 

nw x x v<n xxxx 


10860 


TAATTTAAAA 

XJ^nx X X ruWl 


GGATPTAGGT 

\J\JJ^ X O X AW O X 


G AAG AT P PTT 

wnnvjm ww x x 


TTTGATAATP 

xxx x /thT*v x o 


TPATGAPPAA 


AATPPPTTAA 

njnxwwwx inn 


1092 0 


PGTGAGTTTT 

OO X VJ,nVJ X X X X 


PGTTPPACTG 

WW X X V— V— nw Xu 


AG CGTP AG AC 


PPPGTAG AAA 


AGATPAAAGG 


ATPTTPTTGA 

n x w x x w x x VJri 


10980 


Un, X O O X XXXX 


TTPTGPGPGT 

X X O X VJ w w O O X 


AATPTGPTGP 

rVrl X O X WW X WW 


TTGPAAAPAA 


AAAAAPPAPP 


GPTAPPAGPG 

ov— x no OnOV—O 


1 1 OA n 

X X O *x v 


O X VJVJ X X X O X X 


TGPPGGATPA 

X v\> O VJ On X On 


AGAGPTAPPA 

nOnOO XnOOn 


A PTPTTTTTP 

nV. XOXXXXXO 


PGAAGGTAAP 

OOnnOO InnL 


TGGPTTPAGP 
x ooo x x Onoo 


liXv U 


AGAGPGPAGA 


TAPPAAATAP 
inUV-nnninv. 


TGTPPTTPT A 

X O X OO X X O X n 


GTGTAGPPGT 

VJ X OiriOv.v*0 X 


AGTTAGGPPA 
rto x x nv70V>v.n 


PPAPTTPAAG 

V. V—nO X X v.nn.o 


111 £fi 


AAPTPTGTAG 

rtrtv- X O X O X nO 


PAPPGPPTAP 

0.nV — ww O O X n O 


ATAPPTPGPT 

n X www X 


PTGPTAATPP 

O X VJ O X nn X ww 


TGTT AP P AGT 

X VJ X X rt.v. X 


GGPTGPTGPP 
ooo X oo x oo o 


X X^ £t \J 


AGTGG GG AT A 

nw X OO O VJn X n 


AGTPGTGTPT 

riu X v. O X O X O X 


TAPPGGGTTG 

lnV,V.VIV3V3 JL XVJ 


GAPTPAAGAP 

OnO X OnnOnO 


GATAGTTAPP 
on x no x x no o 


GGATAAGGPG 
oon X nnOOOO 


1 1 OOA 


PAGPGGTPGG 


GPTGAAPGGG 

VJ w X On-nOOO O 


GGGTTPGTGP 

VJVJVJ X X OO X OO 


APAPAGPPPA 

n On O n Vj O O V— n. 


GPTTGGAGPG 

OV. X X UunuUU 


AAPGAPPTAP 

nnOOnOO X nO 


X1J1U 


APPGAAPTGA 

nUCUnnU X VJn 


GATAPPTAPA 
onX noo xnOn 


GPGTGAGPAT 

VJVJVJ X OnOOn X 


TGAGAAAGPG 

X OnOnnnO OO 


PPAPGPTTPP 


PGAAGGGAGA 

o OnnO o Onon 


1 1 Ann 


AAGGPGGAPA 


GGTATPPGGT 

VJVJ X n X OOOO X 


AAGPGGPAGG 


GTPGGAAPAG 

VJ X OOOnnOnVJ 


GAGAGPGPAP 


GAGGGAGPTT 

OnO O OnO O X X 


X J. fx D U 


PPAGGGGGAA 


APGPPTGGTA 

nOOOO X VJVJ In 


TPTTTATAGT 

IVtlli, n X nO X 


PPTGTPGGGT 

V- V- X VJ X v_ VJVJVJ X 


TTPGPP A PPT 

X AV.VJV.LftV.Ui 


PTG A P'P'I'G A G 
o x ono X X OnO 


1 1 R90 

X. XO \J 


GGTGG A TTTT 


TGTGATGPTP 

X VJ X uri X VJ w X O 


GTPAGGGGGG 

VJ X OnOVJOOOO 


PGGAGPPTAT 

V-UOnOl-C X n X 


GGAAAAAPGP 

wuftnnrtnv. O V. 


PAGPAAPGPG 
onoonfto ooo 


1 1 ^ Rfi 


GG PTTTTT A P 

w v— o X x x x x rvv* 


GGTTPPTGGP 

VJVJ X XOOXOOO 


PTTTTGPTGG 

O X X X X VJw X VJVJ 


PPTTTTGPTP 

OO X X X 1UV> JLV> 


APATGTTPTT 

rWmtt X w X X V* X X 


TPPTGPGTTA 
xooxooox xn 


11640 


TGCCPTGATT 

X O O O O X On X X 


PTGTGGATAA 

V— X VJ X OOn Xftn. 


PPGTATTAPP 

V,V.Uirtl XnOO 


GPPTTTGAGT 

www X X lUfiU X 


GAGPTGATAC 

urtu v^ X Ort. X n w 


CGCTCGPPGP 
ooo x ooo ooo 


11700 


AGCCGAACGA 


CCGAGCGCAG 


CGAGTCAGTG 


AGCGAGGAAG 


CGGAAGAGCG 


CCCAATACGC 


11760 


AAACCGCCTC 


TCCCCGCGCG 


TTGGCCGATT 


CATTAATGCA 


GCTGGCACGA 


CAGGTTTCCC 


11820 


GACTGGAAAG 


CGGGCAGTGA 


GCGCAACGCA 


ATTAATGTGA 


GTTAGCTCAC 


TCATTAGGCA 


11880 


CCCCAGGCTT 


TACACTTTAT 


GCTTCCGGCT 


CGTATGTTGT 


GTGGAATTGT 


GAGCGGATAA 


11940 


CAATTTCACA 


CAGGAAACAG 


CTATGACCAT 


GATTACGCCA 


AGCTTCCGCG 


G 


11991 



(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:11: 
ACGTACGTAC GGGCCCACCA CTGTTGTAAC TTGTAAGCC 3 9 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS: 



-107- 



WO 98/56921 PCT/US98A1921 



(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 

AGGCGGACCT TTGCACTGTG AGTTACCTTC GC 32 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

CTCTGTCGAC GAGCGCAGCT GCACGGGTC 29 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

GCGAAGGTAA CTCACAGTGC AAAGGTCCGC CT 32 
(2) INFORMATION FOR SEQ ID NO : 15 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 92 99 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:15: 
GAGCTCCACC GCGGTGGCGG CCGCTCTAGA ACTAGTGGAT CCGTCGACCA TGGCCAGTTG 60 
CCGGTGGAGC AGGTAAAAAC ACCGTAGCGT AGCAGCCAGG CGGAAGCAGA CGCACAGCAC 12 0 

AGGTTGGTTA TGATAGTCAG CCGGGCCACA TGTGTGTAGT TGGTACACTG ATACGCTTAC 180 
ACTGTCTCTC CTTTCTTTTT TATTTGTCAC CTTTGGTCGA GCTTACATAA TTGTGTGACT 240 
AAAAAAAGGT CACTTCATTC AGAAATTTAG GGTTGTGGGA ATTTTGGATT TTATTGTGTC 300 
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m*** rn T% m tv t\ itt 

TGTATAGAGT 


AGCTATAGCT 


AGCTAGCTAG 


a tptp 21 tptt 

Hlul uA lull 


AATAATTATG 


ACGATGAGAT 


360 


TGGCCCGCTT 


GGCCGCTTGC 


ATTGTCTCCC 


TAPPTPAATA 


ATPTTTTPAP 


TTTGTCTTGC 


420 


CTTT CTTT C A 


GCTCTAACAA ATTGGAGTAG 


PPATGAPTGA 


GATACATATA 


TAAAAGCGAA . 


480 


AACCGCTGCT 


CTCTGTTAAT 


TATTGCACAT 


p a p a p a t a pp 


PPAAGCCTTA 


AGGACAATC7V 


540 


ACTAAGGATG 


GTAATAACTA 


AGGCTAGTGA 


pptppa apta 


GGGATGTTAA 

VjVJv3rt lull 


TATACTCTAG 


600 


ATTTTAGACT 


ATAAAATTTA 


AGGATCGAAT 


p 2i p a tt 21 pt a 


TPGAACTATA 


TTTAT ATT CA 

xxx **. X *x X X \m~* + 


660 


TTTCTAAACT 


AAATTAATTA 


AGCACCCTAA 


A 1 Inl IvjI ijn 


TPA21PAPAPA 


TTTCGATCGT 

XXX X ^» X 


720 


T% rr*i « y*i tv f ||M 1 TV 

GATC CATTAT 


TACTCCTTGG 


TCAAACTAAT 


PTP PT TTT 21 T 
1 v_o 111 Ini 


PTPAPTATTT 

\J 1 X -fi- xxx 


CATC AT CTTT 


780 


TTTGCGAACG 


GGTTTATAGC 


CCGTGTTCCA 


tt 2i tp 2i pp 21 P 

1 1H1 UnOUnL 


ATGAAPGGTT 

x unnuuvj x x 


TAAACAAAGT 


840 


TAC AT AT C A 1 


CCCAGCTAGC 


TACCTAGATT 


2i 2i p p a tpp 


GTTPGGTATA 


TATATATAGT 


900 


TTATATATTT 


GGTATATATA 


TATATATATA 


TATATATATA 


TATATATCAC 


ACGTCAGCTT 


960 


ATATTACGTA 


AAGTGGGGTT 


AGTTTTCAAG 


AAoLo 1 vjovjH 


PPAPTPAPPT 


CTGCAGTCTG 


1020 


ACCTTGGCTT 


CAGCTTCGAC 


AGCAAACAGT 


PaTPTPTTHf! 
LA 1 <vjvj 


A AGPTA AGGA 


CAGTCT CC AA 


1080 


CAGTCAACAA 


AGCAGCGGTC 


TGCTTGTAGT 




71 PPZiPPAPPT 
ALvjAL^-AVjV^ 1 


ATATPTAGPA 

A X rt. lv« 1 n\J\—n 


1140 


TCATAACAAC 


GGTAAGATCA 


TCTCTAGCAC 


GALAAAL 1 1 A 


PI TTT 7i TITTa 21 
ul 1 1AA1 1AA 


TTATGTPTAA 


1200 


TCCGTTGTTG 


TTAGCTTAAA 


CTTTCTAGCC 


lLt lAlGLlA 


71P21PAPTTPT 
Avj AVjAvj 1 1 L 1 


PTAPTTPTAP 

V— X rWJ X X V_ InV.. 


1260 


TCAGGTGGAT 


TGATATATAA 


ATTGGGAATC 


1 lL.iAbuL.Vjl 


P 71 P 21 21 PPT A T 
LALAAvaVa 1 A 1 


GPTAPAPATP 


1320 


AATCAATGAA 


CGGACAAAGC 


AACGGTAAGA 


1 LLGALLLAVj 


TA A A APT A AT 


AGPGTTAGGG 


1380 


CATGTACAAC 


CTAGACACTG 


ATGCACAGTA 


PTPP1 7Af2T21T 


AAPAPAPAAP 


TAAAACACAA 


1440 


#-t tv mit tv m tv tv mn 

CATAATAATA 


CAGTGGTTAT 


ATCTAAAACA 


TPTPTPTT21 P 1 
IvjlulLl 1AL 


PATATTPATT 


GTACCAATTA 

nJ X iiiW Lf 4TxT^> 1. X£a. 


1500 


GAACATTTAA 


TAAATTAAAG 


TGAC CAATCA 


uL 1 AGL.L ILL. 


TPTPTPGAAP 


ATAGAGCTAA 


1560 


GACATTGTGT 


CTTCGTCAAG 


ATACATGTCT 


1 AAG 111111 


TATATTPAPT 
1 *\ in X 1 V.AV. x 


C C C AAA G AC A 


1620 


CACTCTAAGA 


CACAACGTAA 


CACACCCATT 


pt a p n tp p TP 


TTAAPPTAAG 


TTATCATGGA 


1680 


TGACCACGCG 


TGGCAATTAA 


AAAAATAATT 


TTTflPPTPPT 
1 1 luLL 1LL1 


A A A APPTPTT 


TPTTAATTGG 

X V X X 4***. X X WW 


1740 


TT C TTG CTTG 


CAAATCACCA 


GCGAACCCAT 


ATPA A APPAT 


GPTCAAAATC 


TGGCCACCGC 


1800 


ATCAGGG 1 1 G 


GTGAATGCAA 


VGTAAAAAAT 


AATPPATAAA 


TCAGCTCTCT 


GATCAGTTAT 


1860 


ATAATLGTGL 


CTTTTAATTA 


TTCATGCCAG 


PTTT21TPTP21 

LI 1 InlLl VJft 


PTPACGAAAT 


CATTGATAAA 


1920 


TT ATT C C T C A 


GCTGTATTAG 


AAAGAGCAGT 


PTTPTTT A A P 
Ullvjlll rVraV- 


TTGGAAAGTG 


ATGTGGAAGC 


1980 


GTGTGA i 1 GC 


GGTTGAGCTT 


GTATAGGAGT 


71 71 A ATPAPPA 


A P AGT AGG AA 


AATAATTTTT 


2040 


TCGGATTAAA 


ACCGGTTGTT 


TGGACTGCGG 


PAPAT21PA AT 


T PAT AG AG AT 


AAAAACACCG 


2100 


rn TV /**• 7\ TV /""T^ TV ' PH I 

TAGAAGTATT 


AGAAGCCGAT 


AAAGATTAAA 


pppna ATP A A 


PPAAPAGGCT 


AAACAAATCC 


2160 


GGCGCCTlaA 


AAGTCAAGAG 


CAGGTACTGG 


PPTPTPTTPP 


APAPGTCGCT 

rtV«.ftV*VJ X V.VJ\» A 


TTTTGTCTCC 


2220 


CCCTGGClll 


TGGGTGAGAG 


TAGTAGGGAT 


PPT A A 21PTTT 
uL 1 MAM.VJ 111 


PPTTTPTCTT 

V3\— X X X W X \- X X 


TTTGAGGCAT 


2280 


firry ta m7\ /*i/*»/**it' 

GTGATAGGCT 


CTTGTTAGTT 


GCTAGGGCTA 


loll 1 Al AA1 


ATTTPPGPTT 
i\ x x x o\».wv« x x 


TT AC CT ATGT 

X X ^— ' X ** X X 


2340 


ACGTAAGAAC 


CGGATGGAAT 


AATGCTATGC 


AGGAACCAAT 


TATGTTTGGT 


tv TV TV TV TV m 

CGAAATATAT 


2400 


AGTGACCTAT 


CATAATGTTA 


TCCCTGTTCA 


TGTACCTAGG 


TGGCTAATGA 


TATACGGCAT 


2460 


ATGAATACAG 


TAATCATCCA 


AGCACGTAAA 


AACTCGCTAG 


ACGTTTATGC 


CTGCTAGCCT 


2520 


GCTGGGTGTG 


TAGACTGGAG 


TACTGGACAA 


ACATCGCAAT 


ACAGAGGTAC 


AGTATTTGTC 


2580 
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TAGACAATGA 
TCTCCATGAG 
CGTATAAATA 
ACAGTAGCAA 
TGCAGAAGAA 
TTTCCAACCA 
AGCACAAGCA 
ACCCCGCGTT 
TTTCCTGCAT 
ATGAATCATC 
AGATGTGAAA 
TGCGAAAAAA 
GGATACGAAT 
AACAAAAGTA 
TTTGGTTACT 
TATGGTTCGA 
CTTGATACAG 
CTGTTATGGT 
CATAATGCGA 
TTCAACAGGC 
ATAGGTTCGT 
GCAGTGCTTG 
ACCAAAATTT 
CGTAGGAGCT 
CTACCTTATT 
GTCCCCTGCT 
ATATATACCA 
CCAGCAACAC 
TGGTCCGTCC 
TCAGTCTGGA 
AAAGCCGGGC 
GTAATTATGC 
GCCAGCGTAT 
ATCAGGAAGT 
ATGTTATTGC 
AGACTATCCC 
TCCATGATTT 
ACACCTGGGT 



TATACATAGA 
CTCTTCTCTC 
ATCTCACAAT 
CCAACGCCGC 
CAAATGTAAG 
ACGCCTCCTT 
CCTAACGCGA 
TGACCTGATC 
TGGACAGCAG 
CGGTCTTTCG 
CCACGAGACG 
AGGCGTGCCG 
CCATGCTCGA 
GTTGGATTAT 
TGAGTTCAGA 
CAAATATATC 
CTAGGATAAA 
AGTTTAAGTC 
TTGTTCACAA 
AAGTTGCATA 
CATTTTAGTT 
CTGTTTAATA 
TAATCTGTCA 
GAATTGTAAC 
CTATACGCAT 
CCAAACCAAC 
CCATATGCCC 
TCTTCTCTTA 
TGTAGAAACC 
TCGCGAAAAC 
AATTGCTGTG 
GGGCAACGTC 
CGTGCTGCGT 
GATGGAGCAT 
CGGGAAAAGT 
GCCGGGAATG 
CTTTAACTAT 
GGACGATATC 



TAAAAACCAC 
TGCTGTTTCT 
ACTTCCTTAT 
TCGATGTTTC 
CAGCTCCTAC 
CGAGGAACCT 
ATCATCACGA 
GGAAGCATGC 
AAGCCAGTCA 
TATTGATCAC 
AGCGATAAAT 
CTGTAGCATC 
CAGAACCCTC 
TTCAGAAGCA 
CAGTGTATGC 
AGTATATCAC 
GTTAGAACGA 
CATTCCTGGA 
TAAAATTGCA 
ATGCATGTGC 
TCACTCCTTC 
AGTGGCAGAG 
CTTTGAGCTA 
TTGATGGGAT 
CACCCTAACA 
TCTCCTTTCC 
ATCTTATGAA 
TAACATAGTA 
CCAACCCGTG 
TGTGGAATTG 
CCAGGCAGTT 
TGGTATCAGC 
TTCGATGCGG 
CAGGGCGGCT 
GTACGTATCA 
GTGATTACCG 
GCCGGAATCC 
ACCGTGGTGA 



TGTTGTAACT 
TCCTCTGCTA 
TTTCAGCATG 
CTTCAAGAAA 
AGGTACCAGT 
GGTCGTGCTG 
CGCGCCGTGT 
GTGTGTGTTG 
TGTTAGGCAC 
TAGTTCACTA 
CTTAGACGTT 
ATTCGTATAC 
CCTTCCCTGC 
AAATCTCACT 
TATATTGTCA 
TACTTTGTTA 
TGACTGATCT 
CGACTCCAGA 
TGATGTTCTT 
ATATATGAGC 
ACATTATTCC 
CTGTTTTCAC 
AAACTGAAGC 
TACTATAGCA 
ACCCGGCTGA 
TTGCATGCAC 
ACCATCCACA 
CAGCGAAGGT 
AAATCAAAAA 
ATCAGCGTTG 
TTAACGATCA 
GCGAAGTCTT 
TCACTCATTA 
ATACGCCATT 
CCGTTTGTGT 
ACGAAAACGG 
ATCGCAGCGT 
CGCATGTCGC 



TGTAAGCCAC 
ACTGCGTTAT 
GCCTCTTTTA 
CGGCCACTCA 
AGTCATGTCA 
ACATGAATGT 
ACTGGGCGTT 
GCTGCAGGAC 
TCACGCGCTC 
CGCTGATATA 
ACCTATCCAT 
ACTTTTGTCC 
AGATAACGAC 
TTTCGCTGGC 
TGTGCTGCGT 
TGGGTGGGGC 
ACTGTAAAGC 
TCCAGGATAT 
CTACTCTTTA 
AGCATAATCA 
AGCCCTTGAA 
TCCACCTACG 
ACCAAACCGC 
GTTGCTACAG 
CTGCTGCATC 
TACACCCACT 
AGAGGAGAAG 
AACTCACATG 
ACTCGACGGC 
GTGGGAAAGC 
GTTCGCCGAT 
TATACCGAAA 
CGGCAAAGTG 
TGAAGCCGAT 
GAACAACGAA 
CAAGAAAAAG 
AATGCTCTAC 
GCAAGACTGT 



TAGCTCACGT 
GATATGACGT 
TGTTTATTTA 
CTATGTGGTG 
GTGTGGAAGC 
AGGCCATGCA 
GGTACATCAC 
CGGCTATAGG 
CTGCCGTTTG 
GCAAATTTTA 
ATGAAGCTTG 
CCAAAGACAG 
ACTTAAGTAT 
CTTTTTGTAC 
AAGGTTTAAA 
CTAGCACAAA 
GACACCTGTC 
GATGCTGTTA 
GGCAGTTTTG 
TCAATTAATC 
GAAAAATGTA 
CTTGTCTAGG 
TACAAAAGAA 
TTCTAGCTAG 
TGACCCCACC 
TCCTGCAGCT 
AAACAATCAA 
GCAACTTCCA 
CTGTGGGCAT 
GCGTTACAAG 
GCAGATATTC 
GGTTGGGCAG 
TGGGTCAATA 
GTCACGCCGT 
CTGAACTGGC 
CAGTCTTACT 
ACCACGCCGA 
AACCACGCGT 
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L 1 G 1 1 G AL 1 G 


cchcr* tp p tp 
GLAGG 1 L7L7 1 G 


PPPA ATPPTP 
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a TPTPHPPPT 
AiulL/ibLbi 


TP A A PTPPPT 
X u/inL X uL V3 X 


PATPPPPATP 


ft u 


AHLiibb 1 LL ± 


X LL AAL X \J\Ji\ 


UnnVJ 0 ^ r\ X r\ 




n P A A C5Tf!mY5 

ULrinU X VJVJ X \J 


AATPPPPAPP 




1 L 1 vjoLAAv-L 


pppTPa appt 

LjLjLj 1 LAALL 1 


T A TPTPT ATP 


M-rtV-. IVjl VjV_\J x 


PAPAHPPAAA 


APPPAPAPAP 




AG lul G A 1 A 1 


L lALLLbL 1 i 


LoLu X ^UuLri 


TPPPPTPaf!T 
X LLLjLj X Lnu 1 


np. p a P/rrc a a p. 


PPPP A AP APT 


jIuu 


TLL xGAx X.AA 


npii PA A A PPP 

LLALAAALLb 


T^PT7V ^1/ 1 .» 1 11 1 1 a 
1 1L 1 AL X X XA 


PTPP PTTTPP 
L XLLjL 1 X i 0L1 


TPPTP A TP A A 
X L.L XLAXGAA 


P A TPPPP A PT 
LA X uLuuAL X 


JlOU 


TACGTGGLAA 


AGGA1 1LGA1 


A A HC ,r VHC tT VH A 
AALG X bL X bA 


TPPTPPapn a 

1 LjLj 1 LjLALLjA 


PP A PPP A TTA 
LLALGLAX XA 


A TPP A PTPP A 
A X ub AL X GGA 


coon 


TTGGGGlCAA 


pmppm* PPPT 

CTCClACLbl 


A PPTPPPATT 

ALL 1 LbLAl 1 


A PPPTTA PPP 

ALLLx 1ALGL 


TP A A P A P A TP 
1 G AAG AG A 1 G 


PTPP A P" l 'PPP 

L 1 LGAL 1 GGG 


coon 


CAGATGAACA 


•"PPPPA TPPTP 

TGGGATCGTG 


GTGATTGATG 


A A APTPPTPP 

AAAL 1 GL 1 GL 


TGTCGGCTTT 


AACCTCTCTT 


c "3 ji n 


TAGGCATTGG 


fnmmripTv A PPP 

TTTCGAAGCG 


PPPAAPA APP 

GGLAALAAGL 


PP A A A P A A PT 

LGAAAGAAL 1 


PTA PA PPPA A 

G1AGAGLGAA 


P A PPPA PTP A 

GAGGL AG T LA 


541 OU 


ACGGGGAAAC 


TPA PP A A PPP 

TLAGLAAGLG 


PA PTTA P A PP 

GAG 1 X AGAGG 


PP A TTA AAPA 

LGAX XAAAGA 


/-t /-tTP A T A P PP 

GL 1GA1AGLG 


PPTP APA AAA 
LG X GALAAAA 


C /> Pi 


accacccaag 


CGTGGTGATG 


TPP A PT A H "MP 

TGGAGTA xTG 


PP A A PP A A PP 

C LAALGAAL L 


pp a ta r*r*rT**v 
GGATAGLLG 1 


ftf»r*iftA APTPP 

L L GL AAGTGL 


c tr **i A 


ACGGGAATAT 


TTPPPPH PTP 

TTCGCCACTG 


/^rTT^Ti APPA A 

G G GGAAGG AA 


PPPPTA A A PT 

CGLGx AAAL x 


pp a r*r*r*c* a pp 
LGALLLGALG 


PPTPPP ATPA 

LG 1 LLGAx CA 


C C O A 

55 SO 


CCTGCGTCAA 


TGTAATGTTC 


TPPP A pn/imp 

X GLGALGL I G 


A P A PPP A TA P 

AL AL L G A X AL 


P ATP A PPP AT 
LA X LAGLGA X 


L 1 L 1 1 1 G A 1 G 


5540 


TGCTGTGCCT 


P A A PPPTT A T 

GAAL LGTTAj. 


TA PPP A TPPT 

I AGGGA 1 GG I 


A TPTPP A A A P 

A1G 1 LLAAAG 


PPPPPA T'l" PP 

LGGLGAX lib 


P A A A PP P P A P 

GAAALGG L AG 


C *7 A A 

5 / 00 


AGAAGGTACT 


PPAAAAAPAA 

GGAAAAAGAA 


/-trprp TP P P PT 

LlxLlGGLLx 


PPPAPPAPA A 

GGLAGGAGAA 


A PTP P A TP A P 

AL 1 GLA1 LAG 


PPP A TTA TP A 

LLGAX lAxLA 


57o0 
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/-i/ r * , PPPT 1 PP A T 

LbbLblbbAl 


a ppttapppp 
AGGX XAGGGG 


PPPTPPAPTP 
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A ATPTAPAPP 
AAXG XALALL 


PAPATPTPPA 
LA LA lbi GGA 
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GTGAAGAGx A 
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TPPPTPP ATA 
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TPTA TP A PPP 
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PP r T'^ ifMiiiii i^t A 

LG1L 1 1 1GA1 


LGLG 1 LAGLG 


5 0 0 U 




TP A A PA PPT 1 A 

i bAALAbb 1 A 


XGGAA1 1 ILL 


PPP A TTT T P. P 
LLGAX X X XLL 


P APPTPPPA A 
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PPP A T A TTP P* 
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PPP* A TPTTP A 
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MALjV-A X Lj iAA 


0 Z f± u 


TAATTAACAT 


PTA A TP PA TP 

G xAATGLA 1 G 


ACGTTATTTA 


TPAPATPPPT 

1 GAGA 1 GGG I 
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PPPA A A PTAP 
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A 7V 7V 7V 1^' f" 1 7V T 1 
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DJOU 
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PTP A TPTA TP 
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TTA PTA P A TP 

1 1AL1AGA1L 
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GA 1 LGGGAA 1 


TA A P PTTA TP 
1AAGL1 1A1L 


P AT A PPPTPP 
GAX ALLG X LG 


ca 0 n 


a p/'" i a ppp 


bbbbLLLbb I 


A PPP A A TTPP 
ALLLAA1 XLG 


PPPTATAPTP 
LLL 1AX AG XG 


AG X LG X AX X A 


PA ATTPAPTP 
L AA X X LAL X G 


c/1 0 n 
Oft 0 u 


PPPPTPPTTT 


T A P A A PPTPP 
1 ALAALb 1 LG 


TP. A PTP PP. A A 
X L»AL X LjVjLiAA 


a a r , r , r ,r mr , nn 

AALLL X oVjLLj 


TTA PPPA APT 
X X ALLLAAL X 


TA ATPPPPTT 
XAAXL.LLLX 1 


£ ca n 


ppapp apatp 


pppptttppp 
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L,ALjL X uuLu X 






PPATPPPPPT 
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P A A TPPPP A A 
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X X vj InnnUU X 
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tta a a attpp 


PP* 1 " 1 ■ A A A TTT 


TTnTT AAA TT* 
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TTAAPPAATA 
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6780 




PTPP APT ATT 
UiL LnU Inl 1 




paptppa Apn 


TPAAAPtdPtPP. 


AAAAAPPPTP 


684 0 


TATCAGGGCG 


ATGGCCCACT 


ACGTGAACCA 


TCACCCTAAT 


CAAGTTTTTT 


GGGGTCGAGG 


6900 


TGCCGTAAAG 


CACTAAATCG 


GAACCCTAAA 


GGGAGCCCCC 


GATTTAGAGC 


TTGACGGGGA 


6960 


AAGCCGGCGA 


ACGTGGCGAG 


AAAGGAAGGG 


AAGAAAGCGA 


AAGGAGCGGG 


CGCTAGGGCG 


7020 


CTGGCAAGTG 


TAGCGGTCAC 


GCTGCGCGTA 


ACCACCACAC 


CCGCCGCGCT 


TAATGCGCCG 


7080 


CTACAGGGCG 


CGTCCCAGGT 


GGCACTTTTC 


GGGGAAATGT 


GCGCGGAACC 


CCTATTTGTT 
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TATTTTTCTA AATACATTCA AATATGTATC CGCTCATGAG 
TTCAATAATA TTGAAAAAGG AAGAGTATGA GTATTCAACA 
CCTTTTTTGC GGCATTTTGC CTTCCTGTTT TTGCTCACCC 
AAGATGCTGA AGATCAGTTG GGTGCACGAG TGGGTTACAT 
GTAAGATCCT TGAGAGTTTT CGCCCCGAAG AACGTTTTCC 
TTCTGCTATG TGGCGCGGTA TTATCCCGTA TTGACGCCGG 
GCATACACTA TTCTCAGAAT GACTTGGTTG AGTACTCACC 
CGGATGGCAT GACAGTAAGA GAATTATGCA GTGCTGCCAT 
CGGCCAACTT ACTTCTGACA ACGATCGGAG GACCGAAGGA 
ACATGGGGGA TCATGTAACT CGCCTTGATC GTTGGGAACC 
CAAACGACGA GCGTGACACC ACGATGCCTG TAGCAATGGC 
TAACTGGCGA ACTACTTACT CTAGCTTCCC GGCAACAATT 
ATAAAGTTGC AGGACCACTT CTGCGCTCGG CCCTTCCGGC 
AATCTGGAGC CGGTGAGCGT GGGTCTCGCG GTATCATTGC 
AGCCCTCCCG TATCGTAGTT ATCTACACGA CGGGGAGTCA 
ATAGACAGAT CGCTGAGATA GGTGCCTCAC TGATTAAGCA 
TTTACTCATA TATACTTTAG ATTGATTTAA AACTTCATTT 
TGAAGATCCT TTTTGATAAT CTCATGACCA AAATCCCTTA 
GAGCGTCAGA CCCCGTAGAA AAGATCAAAG GATCTTCTTG 
TAATCTGCTG CTTGCAAACA AAAAAACCAC CGCTACCAGC 
AAGAGCTACC AACTCTTTTT CCGAAGGTAA CTGGCTTCAG 
CTGTCCTTCT AGTGTAGCCG TAGTTAGGCC ACCACTTCAA 
CATACCTCGC TCTGCTAATC CTGTTACCAG TGGCTGCTGC 
TTACCGGGTT GGACTCAAGA CGATAGTTAC CGGATAAGGC 
GGGGTTCGTG CACACAGCCC AGCTTGGAGC GAACGACCTA 
AGCGTGAGCT ATGAGAAAGC GCCACGCTTC CCGAAGGGAG 
TAAGCGGCAG GGTCGGAACA GGAGAGCGCA CGAGGGAGCT 
ATCTTTATAG TCCTGTCGGG TTTCGCCACC TCTGACTTGA 
CGTCAGGGGG GCGGAGCCTA TGGAAAAACG CCAGCAACGC 
CCTTTTGCTG GCCTTTTGCT CACATGTTCT TTCCTGCGTT 
AC C GTATTAC CGCCTTTGAG TGAGCTGATA CCGCTCGCCG 
GCGAGTCAGT GAGCGAGGAA GCGGAAGAGC GCCCAATACG 
GTTGGCCGAT TCATTAATGC AGCTGGCACG ACAGGTTTCC 
AGCGCAACGC AATTAATGTG AGTTAGCTCA CTCATTAGGC 
TGCTTCCGGC TCGTATGTTG TGTGGAATTG TGAGCGGATA 
GCTATGACCA TGATTACGCC AAGCTCGGAA TTAACCCTCA 
(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 



ACAATAACCC 
TTTCCGTGTC 
AGAAACGCTG 
CGAACTGGAT 
AATGATGAGC 
GCAAGAGCAA 
AGTCACAGAA 
AACCATGAGT 
GCTAACCGCT 
GGAGCTGAAT 
AACAACGTTG 
AATAGACTGG 
TGGCTGGTTT 
AGCACTGGGG 
GGCAACTATG 
TTGGTAACTG 
TTAATTTAAA 
ACGTGAGTTT 
AGATCCTTTT 
GGTGGTTTGT 
CAGAGCGCAG 
GAACTCTGTA 
CAGTGGCGAT 
GCAGCGGTCG 
CACCGAACTG 
AAAGGCGGAC 
TCCAGGGGGA 
GCGTCGATTT 
GGCCTTTTTA 
ATCCCCTGAT 
CAGCCGAACG 
CAAACCGCCT 
CGACTGGAAA 
ACCCCAGGCT 
ACAATTTCAC 
CTAAAGGGAA 



TGATAAATGC 
GCCCTTATTC 
GTGAAAGTAA 
CTCAACAGCG 
ACTTTTAAAG 
CTCGGTCGCC 
AAGCATCTTA 
GATAACACTG 
TTTTTGCACA 
GAAGC CATAC 
CGCAAACTAT 
ATGGAGGCGG 
ATTGCTGATA 
CCAGATGGTA 
GATGAACGAA 
TCAGACCAAG 
AGGATCTAGG 
TCGTTCCACT 
TTTCTGCGCG 
TTGCCGGATC 
ATACCAAATA 
GCACCGCCTA 
AAGTCGTGTC 
GGCTGAACGG 
AGATACCTAC 
AGGTATCCGG 
AACGCCTGGT 
TTGTGATGCT 
CGGTTCCTGG 
TCTGTGGATA 
ACCGAGCGCA 
CTCCCCGCGC 
GCGGGCAGTG 
TTACACTTTA 
ACAGGAAACA 
CAAAAGCTG 
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7320 
7380 
7440 
7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
8280 
8340 
8400 
8460 
8520 
8580 
8640 
8700 
8760 
8820 
8880 
8940 
9000 
. 9060 
9120 
9180 
9240 
9299 
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(A) LENGTH: 94 08 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 



GAGCTCCACC 


GCGGTGGCGG 


CCGCTCTAGA 


ACTAGTGGAT 


CCTCTAGAGT 


CGACCATGGC 


60 


CAGTTGCCGG 


TGGAGCAGGT 


AAAAACACCG 


TAGCGTAGCA 


GCCAGGCGGA 


AGCAGACGCA 


120 


CAGCACAGGT 


TGGTTATGAT 


AGTCAGCCGG 


GCCACATGTG 


TGTAGTTGGT 


ACACTGATAC 


180 


GCTTACACTG 


TCTCTCCTTT 


CTTTTTTATT 


TGTCACCTTT 


GGTCGAGCTT 


ACATAATTGT 


240 


GTGACTAAAA 


AAAGGTCACT 


TCATTCAGAA 


ATTTAGGGTT 


GTGGGAATTT 


TGGATTTTAT 


300 


TGTGTCTGTA 


TAGAGTAGCT 


ATAGCTAGCT 


AGCTAGATGT 


GATGTTAATA 


ATTATGACGA 


360 


TGAGATTGGC 


CCGCTTGGCC 


GCTTGCATTG 


TCTCCCTAGC 


TCAATAATGT 


TTTGAGTTTG 


420 


TCTTGCCTTT 


CTTTCAGCTC 


TAACAAATTG 


GAGTAGGGAT 


GACTGAGATA 


CATATATAAA 


480 


AGCGAAAACC 


GCTGCTCTCT 


GTTAATTATT 


GCACATCACA 


CATAGGCCAA 


GCCTTAAGGA 


540 


CAATCAACTA 


AGGATGGTAA 


TAACTAAGGC 


TAGTGAGGTC 


GAACTAGGGA 


TGTTAATATA 


600 


CTCTAGATTT 


TAGACTATAA 


AATTTAAGGA 


TCGAATCAGA 


TTAGTATCGA 


ACTATATTTA 


660 


TATTCATTTC 


TAAACTAAAT 


TAATTAAGCA 


CCCTAAATTA 


TTGTGATGAA 


GAGACATTTC 


720 


GATCGTGATC 


CATTATTACT 


CCTTGGTCAA 


ACTAATCTCG 


TTTTATGTCA 


CTATTTCATC 


780 


ATCTTTTTTG 


CGAACGGGTT 


TATAGCCCGT 


GTTCCATTAT 


GAGGACATGA 


ACGGTTTAAA 


840 


CAAAGTTACA 


TATCATCCCA 


GCTAGCTACC 


TAGATTGGAA 


GCATGGGTTC 


GGTATATATA 


900 


TATAGTTTAT 


ATATTTGGTA 


TATATATATA 


TATATATATA 


TATATATATA 


TATCACACGT 


960 


CAGCTTATAT 


TACGTAAAGT 


GGGGTTAGTT 


TTCAAGAAGC 


GTGGGACCAG 


TCACCTCTGC 


1020 


AGTCTGACCT 


TGGCTTCAGC 


TTCGACAGCA 


AACAGTCATC 


TCTTGGAAGC 


TAAGGACAGT 


1080 


CTCCAACAGT 


CAACAAAGCA 


GCGGTCTGCT 


TGTAGTTCTC 


CCTTGCACGA 


CCAGCTATAT 


1140 


CTAGCATCAT 


AACAACGGTA 


AGATCATCTC 


TAGCACGACA 


AACTTAGTTT 


AATTAATTAT 


1200 


GTCTAATCCG 


TTGTTGTTAG 


CTTAAACTTT 


CTAGCCTCCT 


ATGCTAAGAG 


AGTTCTCTAG 


1260 


TTCTACTCAG 


GTGGATTGAT 


ATATAAATTG 


GGAATCTTCT 


AGGCGTCACA 


AGGTATGGTA 


1320 


CACATCAATC 


AATGAACGGA 


CAAAGCAACG 


GTAAGATCCG 


ACCCAGTAAA 


AGTAATAGCG 


1380 


TTAGGGCATG 


TACAACCTAG 


ACACTGATGC 


ACAGTACTCC 


AAGTATAAGA 


CACAACTAAA 


1440 


ACACAACATA 


ATAATACAGT 


GGTTATATCT 


AAAACATGTG 


TCTTACCATA 


TTCATTGTAC 


1500 


CAATTAGAAC 


ATTTAATAAA 


TTAAAGTGAC 


CAATCAGCTA 


GCCTCCTGTC 


TCGAACATAG 


1560 


AGCTAAGACA 


TTGTGTCTTC 


GTCAAGATAC 


ATGTCTTAAG 


TTTTTTTATA 


TTCACTCCCA 


1620 


AAGACACACT 


CTAAGACACA 


ACGTAACACA 


CCCATTGTAC 


ATGCTCTTAA 


CCTAAGTTAT 


1680 


CATGGATGAC 


CACGCGTGGC 


AATTAAAAAA 


ATAATTTTTG 


CCTCCTAAAA 


CCTCTTTCTT 


1740 


AATTGGTTCT 


TGCTTGCAAA 


TCACCAGCGA 


ACCCATATGA AAGGATGCTC 


AAAATCTGGC 


1800 


CACCGCATCA 


GGGTTGGTGA 


ATGCAAVGTA 


AAAAATAATG 


CATAAATCAG 


CTCTCTGATC 


1860 


AGTTATATAA 


TCGTGCCTTT 


TAATTATTCA 


TGCCAGCTTT 


ATCTGACTCA 


CGAAATCATT 


1920 
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GATAAATTAT 
GGAAGCGTGT 
ATTTTTTCGG 
ACACCGTAGA 
AAATCCGGCG 
GTCTCCCCCT 
AGG CATGTG A 
CTATGTACGT 
ATATATAGTG 
CGGCATATGA 
TAGCCTGCTG 
TTTGTCTAGA 
TCACGTTCTC 
TGACGTCGTA 
TATTTAACAG 
GTGGTGTGCA 
GGAAGCTTTC 
CATGCAAGCA 
CATCACACCC 
TATAGGTTTC 
CGTTTGATGA 
ATTTTAAGAT 
AGCTTGTGCG 
AGACAGGGAT 
AAGTATAACA 
TTGTACTTTG 
TTTAAATATG 
CACAAACTTG 
CCTGTCCTGT 
CTGTTACATA 
GTTTTGTTCA 
TTAATCATAG 
AATGTAGCAG 
TCTAGGACCA 
AAAGAACGTA 
AGCTAGCTAC 
CCCACCGTCC 
GCAGCTATAT 



TCCTCAGCTG 
GATTGCGGTT 
ATTAAAACCG 
AGTATTAGAA 
CCTCAAAAGT 
GGCCCCTGGG 
TAGGCTCTTG 
AAGAACCGGA 
ACCTATCATA 
ATACAGTAAT 
GGTGTGTAGA 
CAATGATATA 
CATGAGCTCT 
TAAATAATCT 
TAGCAACCAA 
GAAGAACAAA 
CAACCAACGC 
CAAGCACCTA 
CGCGTTTGAC 
CTGCATTGGA 
ATCATCCGGT 
GTGAAACCAC 
AAAAAAAGGC 
ACGAATCCAT 
AAAGTAGTTG 
GTTACTTGAG 
GTTCGACAAA 
ATACAGCTAG 
TATGGTAGTT 
ATGCGATTGT 
ACAGGCAAGT 
GTTCGTCATT 
TGCTTGCTGT 
AAATTTTAAT 
GGAGCTGAAT 
CTTATTCTAT 
CCTGCTCCAA 
ATACCACCAT 



TATTAGAAAG 
GAGCTTGTAT 
GTTGTTTGGA 
GCCGATAAAG 
CAAGAGCAGG 
TGAGAGTAGT 
TTAGTTGCTA 
TGGAATAATG 
ATGTTATCCC 
CATCCAAGCA 
CTGGAGTACT 
CATAGATAAA 
TCTCTCTGCT 
CACAATACTT 
CGCCGCTCGA 
TGTAAGCAGC 
CTCCTTCGAG 
ACGCGAATCA 
CTGATCGGAA 
CAGCAGAAGC 
CTTTCGTATT 
GAGACGAGCG 
GTGCCGCTGT 
GCTCGACAGA 
GATTATTTCA 
TTCAGACAGT 
TATATCAGTA 
GATAAAGTTA 
TAAGTCCATT 
TCACAATAAA 
TGCATAATGC 
TTAGTTTCAC 
TTAATAAGTG 
CTGTCACTTT 
TGTAACTTGA 
ACGCATCACC 
ACCAACTCTC 
ATGCCCATCT 



AGCAGTGTTG 
AGGAGTAAAA 
CTGCGGCAGA 
ATTAAACCCA 
TACTGGGCTG 
AGGGATGCTA 
GGGCTATGTT 
CTATGCAGGA 
TGTTCATGTA 
CGTAAAAACT 
GGACAAACAT 
AACCACTGTT 
GTTTCTTCCT 
CCTTATTTTC 
TGTTTCCTTC 
TCCTACAGGT 
GAACCTGGTC 
TCACGACGCG 
GCATGCGTGT 
CAGTCATGTT 
GATCACTAGT 
ATAAATCTTA 
AGCATCATTC 
ACCCTCCCTT 
GAAGCAAAAT 
GTATGCTATA 
TATCACTACT 
GAACGATGAC 
CCTGGACGAC 
ATTGCATGAT 
ATGTGCATAT 
TCCTTCACAT 
GCAGAGCTGT 
GAGCTAAAAC 
TGGGATTACT 
CTAACAACCC 
CTTTCCTTGC 
TATGAAACCA 



TTTAACTTGG 
TGAGGAACAG 
TACAATTCAT 
AATGAACGAA 
TCTTGCACAC 
AAGTTTGCTT 
TATAATATTT 
ACCAATTATG 
CCTAGGTGGC 
CGCTAGACGT 
CGCAATACAG 
GTAACTTGTA 
CTGCTAACTG 
AGCATGGCCT 
AAGAAACGGC 
ACCAGTAGTC 
GTGCTGACAT 
CCGTGTACTG 
GTGTTGGCTG 
AGGCACTCAC 
TCACTACGCT 
GACGTTACCT 
GTATACACTT 
CCCTGCAGAT 
CTCACTTTTC 
TTGTCATGTG 
TTGTTATGGG 
TGATCTACTG 
TCCAGATCCA 
GTTCTTCTAC 
ATGAGCAGCA 
TATTCCAGCC 
TTTCACTCCA 
TGAAGCACCA 
ATAGCAGTTG 
GGCTGACTGC 
ATGCACTACA 
TCCACAAGAG 



AAAGTGATGT 
TAGGAAAATA 
AGAGATAAAA 
CAGGCTAAAC 
GTCGCTTTTT 
TCTCTTTTTG 
GCGCTTTTAC 
TTTGGTCGAA 
TAATGATATA 
TTATGCCTGC 
AGGTACAGTA 
AGCCACTAGC 
CGTTATGATA 
CTTTTATGTT 
CACTCACTAT 
ATGTCAGTGT 
GAATGTAGGC 
GGCGTTGGTA 
CAGGACCGGC 
GCGCTCCTGC 
GATATAGCAA 
ATCCATATGA 
TTGTCCCCAA 
AACGACACTT 
GCTGGCCTTT 
CTGCGTAAGG 
TGGGGCCTAG 
TAAAGCGACA 
GGATATGATG 
TCTTTAGGCA 
TAATCATCAA 
CTTGAAGAAA 
CCTACGCTTG 
AACCGCTACA 
CTACAGTTCT 
TGCATCTGAC 
CCCACTTCCT 
GAGAAGAAAC 



1980_ 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 

3060 

3120 

3180 

3240 

3300 

3360 

3420 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 
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AATCAACCAG 
CTTCCATGGT 
GGGCATTCAG 
TACAAGAAAG 
ATATTCGTAA 
GGGCAGGCCA 
TCAATAATCA 
CGCCGTATGT 
ACTGGCAGAC 
CTTACTTCCA 
CGCCGAACAC 
ACGCGTCTGT 
CGGATCAACA 
CGCACCTCTG 
AGACAGAGTG 
AACAGTTCCT 
CGGACTTACG 
ACTGGATTGG 
ACTGGGCAGA 
TCTCTTTAGG 
CAGTCAACGG 
ACAAAAACCA 
AAGTGCACGG 
CGATCACCTG 
TTGATGTGCT 
CGGCAGAGAA 
TTATCATCAC 
TGTGGAGTGA 
TCAGCGCCGT 
TATTGCGCGT 
CGGCTTTTCT 
GAGGCAAACA 
TTGCTACCGA 
TGGTGGCGTA 
TTGGGGTTTT 
GTGTTCATTC 
AACCAAAATC 
TCTACCATTC 



CAACACTCTT 
CCGTCCTGTA 
TCTGGATCGC 
CCGGGCAATT 
TTATGCGGGC 
GCGTATCGTG 
GGAAGTGATG 
TATTGCCGGG 
TATCCCGCCG 
TGATTTCTTT 
CTGGGTGGAC 
TGACTGGCAG 
GGTGGTTGCA 
GCAACCGGGT 
TGATATCTAC 
GATTAACCAC 
TGGCAAAGGA 
GGCCAACTCC 
TGAACATGGC 
CATTGGTTTC 
GGAAACTCAG 
CCCAAGCGTG 
GAATATTTCG 
CGTCAATGTA 
GTGCCTGAAC 
GGTACTGGAA 
CGAATACGGC 
AGAGTATCAG 
CGTCGGTGAA 
TGGCGGTAAC 
GCTGCAAAAA 
ATGAATCAAC 
GCTTCTCGAG 
TTTTGTTTAA 
ATTTAACACA 
TTCGGTTGCC 
ACAACTCAAT 
CAGTTGGCAT 



CTCTTATAAC 
GAAACCCCAA 
GAAAACTGTG 
GCTGTGCCAG 
AACGTCTGGT 
CTGCGTTTCG 
GAGCATCAGG 
AAAAGTGTAC 
GGAATGGTGA 
AACTATGCCG 
GATATCACCG 
GTGGTGGCCA 
ACTGGACAAG 
GAAGGTTATC 
CCGCTTCGCG 
AAACCGTTCT 
TTCGATAACG 
TACCGTACCT 
ATCGTGGTGA 
GAAGCGGGCA 
CAAGCGCACT 
GTGATGTGGA 
CCACTGGCGG 
ATGTTCTGCG 
CGTTATTACG 
AAAGAACTTC 
GTGGATACGT 
TGTGCATGGC 
CAGGTATGGA 
AAGAAAGGGA 
CGCTGGACTG 
AACTCTCCTG 
GGCACTGAAG 
ATAAGTAAGC 
TTGTAAAATG 
ATAGATCTGC 
AAACTCATGG 
TTATCAGTGT 



ATAGTACAGC 
CCCGTGAAAT 
GAATTGATCA 
GCAGTTTTAA 
ATCAGCGCGA 
ATGCGGTCAC 
GCGGCTATAC 
GTATCACCGT 
TTACCGACGA 
GAATCCATCG 
TGGTGACGCA 
ATGGTGATGT 
GCACTAGCGG 
TCTATGAACT 
TCGGCATCCG 
ACTTTACTGG 
TGCTGATGGT 
CGCATTACCC 
TTGATGAAAC 
ACAAGCCGAA 
TACAGGCGAT 
GTATTGCCAA 
AAGCAACGCG 
ACGCTCACAC 
GATGGTATGT 
TGGCCTGGCA 
TAGCCGGGCT 
TGGATATGTA 
ATTTCGCCGA 
TCTTCACTCG 
GCATGAACTT 
GCGCACCATC 
TCGCTTGATG 
ATGGCTGTGA 
TGTATCTATT 
TTATTTGACC 
AATATGTCCA 
TGCAGCGGCG 



GAAGGTAACT 
CAAAAAACTC 
GCGTTGGTGG 
CGATCAGTTC 
AGTCTTTATA 
TCATTACGGC 
GCCATTTGAA 
TTGTGTGAAC 
AAACGGCAAG 
CAGCGTAATG 
TGTCGCGCAA 
CAGCGTTGAA 
GACTTTGCAA 
GTGCGTCACA 
GTCAGTGGCA 
CTTTGGTCGT 
GCACGACCAC 
TTACGCTGAA 
TGCTGCTGTC 
AGAACTGTAC 
TAAAGAGCTG 
CGAACCGGAT 
TAAACTCGAC 
CGATACCATC 
CCAAAGCGGC 
GGAGAAACTG 
GCACTCAATG 
TCACCGCGTC 
TTTTGCGACC 
CGACCGCAAA 
CGGTGAAAAA 
GTCGGCTACA 
TGCTGAATTG 
TTTTATCATA 
AATAACTCAA 
TGTGATGTTT 
CCTGTTTCTT 
CTGTGCTTTG 
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CACATGGCAA 


4260 


GACGGCCTGT 


4320 


GAAAGCGCGT 


4380 


GCCGATGCAG 


4440 


CCGAAAGGTT 


4500 


AAAGTGTGGG 


4560 


GCCGATGTCA 


4620 


AACGAACTGA 


4680 


AAAAAGCAGT 


4740 


CTCTACACCA 


4800 


GACTGTAACC 


4860 


CTGCGTGATG 


4920 


GTGGTGAATC 


4980 


GCCAAAAGCC 


5040 


GTGAAGGGCG 


5100 


CATGAAGATG 


5160. 


GCATTAATGG 


5220 


GAGATGCTCG 


5280 


GGCTTTAACC 


5340 


AGCGAAGAGG 


5400 


ATAGCGCGTG 


5460 


ACCCGTCCGC 


5520 


CCGACGCGTC 


5580 


AGCGATCTCT 


5640 


GATTTGGAAA 


5700 


CATCAGCCGA 


5760 


TACACCGACA 


5820 


TTTGATCGCG 


5880 


TCGCAAGGCA 


5940 


CCGAAGTCGG 


6000 


CCGCAGCAGG 


6060 


GCCTCGGGAA 


6120 


TTTGTGATGT 


6180 






TGTATAAGAT 


6300 


TGACTCCAAA 


6360 


GAAGAGTTCA 


6420 


TAACATAACA 


6480 
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ATTGTTCACG 
GGGCCCGGTA 
ACAACGTCGT 
CCCTTTCGCC 
GCGCAGCCTG 
GTTAAATTTT 
TTATAAATCA 
TCCACTATTA 
TGGCCCACTA 
ACTAAATCGG 
CGTGGCGAGA 
AGCGGTCACG 
GTCCCAGGTG 
ATACATTCAA 
TGAAAAAGGA 
GCATTTTGCC 
GATCAGTTGG 
GAGAGTTTTC 
GGCGCGGTAT 
TCTCAGAATG 
ACAGTAAGAG 
CTTCTGACAA 
CATGTAACTC 
CGTGACACCA 
CTACTTACTC 
GGACCACTTC 
GGTGAGCGTG 
ATCGTAGTTA 
GCTGAGATAG 
ATACTTTAGA 
TTTGATAATC 
CCCGTAGAAA 
TTGCAAACAA 
ACTCTTTTTC 
GTGTAGCCGT 
CTGCTAATCC 
GACTCAAGAC 
ACACAGCCCA 



GCATATATCC 
CCCAATTCGC 
GACTGGGAAA 
AGCTGGCGTA 
AATGGCGAAT 
TGTTAAATCA 
AAAGAATAGA 
AAGAACGTGG 
CGTGAACCAT 
AACCCTAAAG 
AAGGAAGGGA 
CTGCGCGTAA 
GCACTTTTCG 
ATATGTATCC 
AGAGTATGAG 
TTCCTGTTTT 
GTGCACGAGT 
GCCCCGAAGA 
TATCCCGTAT 
ACTTGGTTGA 
AATTATGCAG 
CGATCGGAGG 
GCCTTGATCG 
CGATGCCTGT 
TAGCTTCCCG 
TGCGCTCGGC 
GGTCTCGCGG 
TCTACACGAC 
GTGCCTCACT 
TTGATTTAAA 
TCATGACCAA 
AGATCAAAGG 
AAAAAC CACC 
CGAAGGTAAC 
AGTTAGGCCA 
TGTTACCAGT 
GATAGTTACC 
GCTTGGAGCG 



AAATCTAGAG 
CCTATAGTGA 
ACCCTGGCGT 
ATAGCGAAGA 
GGCGCGAAAT 
GCTCATTTTT 
CCGAGATAGG 
ACTCCAACGT 
CACCCTAATC 
GGAGCCCCCG 
AGAAAGCGAA 
CCACCACACC 
GGGAAATGTG 
GCTCATGAGA 
TATTCAACAT 
TGCTCACCCA 
GGGTTACATC 
ACGTTTTCCA 
TGACGCCGGG 
GTACTCACCA 
TGCTGCCATA 
ACCGAAGGAG 
TTGGGAACCG 
AGCAATGGCA 
GCAACAATTA 
CCTTCCGGCT 
TATCATTGCA 
GGGGAGTCAG 
GATTAAGCAT 
ACTTCATTTT 
AATCCCTTAA 
ATCTTCTTGA 
GCTACCAGCG 
TGGCTTCAGC 
CCACTTCAAG 
GGCTGCTGCC 
GGATAAGGCG 
AACGACCTAC 



AAGCTTATCG 
GTCGTATTAC 
TACCCAACTT 
GGCCCGCACC 
TGTAAACGTT 
TAACCAATAG 
GTTGAGTGTT 
CAAAGGGCGA 
AAGTTTTTTG 
ATTTAGAGCT 
AGGAGCGGGC 
CGCCGCGCTT 
CGCGGAACCC 
CAATAACCCT 
TTCCGTGTCG 
GAAACGCTGG 
GAACTGGATC 
ATGATGAGCA 
CAAGAGCAAC 
GTCACAGAAA 
ACCATGAGTG 
CTAACCGCTT 
GAGCTGAATG 
ACAACGTTGC 
ATAGACTGGA 
GGCTGGTTTA 
GCACTGGGGC 
GCAACTATGG 
TGGTAACTGT 
TAATTTAAAA 
CGTGAGTTTT 
GATCCTTTTT 
GTGGTTTGTT 
AGAGCGCAGA 
AACTCTGTAG 
AGTGGCGATA 
CAGCGGTCGG 
ACCGAACTGA 



ATACCGTCGA 
AATTCACTGG 
AATCGCCTTG 
GATCGCCCTT 
AATATTTTGT 
GCCGAAATCG 
GTTCCAGTTT 
AAAACCGTCT 
GGGTCGAGGT 
TGACGGGGAA 
GCTAGGGCGC 
AATGCGCCGC 
CTATTTGTTT 
GATAAATGCT 
CCCTTATTCC 
TGAAAGTAAA 
TCAACAGCGG 
CTTTTAAAGT 
TCGGTCGCCG 
AGCATCTTAC 
ATAACACTGC 
TTTTGCACAA 
AAGCCATACC 
GCAAACTATT 
TGGAGGCGGA 
TTGCTGATAA 
CAGATGGTAA 
ATGAACGAAA 
CAGACCAAGT 
GGATCTAGGT 
CGTTCCACTG 
TTCTGCGCGT 
TGCCGGATCA 
TACCAAATAC 
CACCGCCTAC 
AGTCGTGTCT 
GCTGAACGGG 
GATACCTACA 



CCTCGAGGGG 
CCGTCGTTTT 
CAGCACATCC 
CCCAACAGTT 
TAAAATTCGC 
GCAAAATCCC 
GGAACAAGAG 
ATCAGGGCGA 
GCCGTAAAGC 
AGCCGGCGAA 
TGGCAAGTGT 
TACAGGGCGC 
ATTTTTCTAA 
TCAATAATAT 
CTTTTTTGCG 
AGATGCTGAA 
TAAGATCCTT 
TCTGCTATGT 
CATACACTAT 
GGATGGCATG 
GGCCAACTTA 
CATGGGGGAT 
AAACGACGAG 
AACTGGCGAA 
TAAAGTTGCA 
ATCTGGAGCC 
GCCCTCCCGT 
TAGACAGATC 
TTACTCATAT 
GAAGATCCTT 
AGCGTCAGAC 
AATCTGCTGC 
AGAGCTACCA 
TGTCCTTCTA 
ATACCTCGCT 
TACCGGGTTG 
GGGTTCGTGC 
GCGTGAGCTA 



6540 m 

6600 

6660 

6720 

6780 

6840 

6900 

6960 

7020 

7080 

7140 

7200 

7260 

7320 

7380 

7440 

7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 
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TGAGAAAGCG CCACGCTTCC CGAAGGGAGA AAGGCGGACA GGTATCCGGT AAGCGGCAGG 8820 
GTCGGAACAG GAGAGCGCAC GAGGGAGCTT CCAGGGGGAA ACGCCTGGTA TCTTTATAGT 8880 
CCTGTCGGGT TTCGCCACCT CTGACTTGAG CGTCGATTTT TGTGATGCTC GTCAGGGGGG 8940 
CGGAGCCTAT GGAAAAACGC CAGCAACGCG GCCTTTTTAC GGTTCCTGGC CTTTTGCTGG 9000 
CCTTTTGCTC ACATGTTCTT TCCTGCGTTA TCCCCTGATT CTGTGGATAA CCGTATTACC 9060 
GCCTTTGAGT GAGCTGATAC CGCTCGCCGC AGCCGAACGA CCGAGCGCAG CGAGTCAGTG 9120 
AGCGAGGAAG CGGAAGAGCG CCCAATACGC AAACCGCCTC TCCCCGCGCG TTGGCCGATT 9180 
CATTAATGCA GCTGGCACGA CAGGTTTCCC GACTGGAAAG CGGGCAGTGA GCGCAACGCA 9240 
ATTAATGTGA GTTAGCTCAC TCATTAGGCA CCCCAGGCTT TACACTTTAT GCTTCCGGCT 9300 
CGTATGTTGT GTGGAATTGT GAGCGGATAA CAATTTCACA CAGGAAACAG CTATGACCAT 9360 
GATTACGCCA AGCTCGGAAT TAACCCTCAC TAAAGGGAAC AAAAGCTG 9408 
(2) INFORMATION FOR SEQ ID NO: 17: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

TTATCTCGAG GGCACTGAAG TCGCTTGATG TGCTGAATT 39 
(2) INFORMATION FOR SEQ ID NO: 18: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 2 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
GGGGAAGCTT CTCTAGATTT GGATATATGC CGTGAACAAT TG 42 
(2) INFORMATION FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 933 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

AGCTTGCATG CCTGCAGGCC GGCCTTAATT AAGCGGCCGC CAGTGTGATG GATATCTGCA 60 
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GAATTCGGCT TGGGGGATCC TCTAGACAAT GATATACATA GATAAAAACC ACTGTTGTAA 120 

CTTGTAAGCC ACTAGCTCAC GTTCTCCATG AGCTCTTCTC TCTGCTGTTT CTTCCTCTGC 180 

TAACTGCGTT ATGATATGAC GTCGTATAAA TAATCTCACA ATACTTCCTT ATTTTCAGCA 240 

TGGCCTCTTT TATGTTTATT TAACAGTAGC AACCAACGCC GCTCGATGTT TCCTTCAAGA 300 

AACGGCCACT CACTATGTGG TGTGCAGAAG AACAAATGTA AGCAGCTCCT ACAGGTACCA 360 

GTAGTCATGT CAGTGTGGAA GCTTTCCAAC CAACGCCTCC TTCGAGGAAC CTGGTCGTGC 420 

TGACATGAAT GTAGGCCATG CAAGCACAAG CACCTAACGC GAATGATCAC GACGCGCCGT 4 80 

GTACTGGGCG TTGGTACATC ACACCCCGCG TTTGACCTGA TCGGAAGCAT GCGTGTGTGT 540 

TGGCTGCAGG ACCGGCTATA GGTTTCCTGC ATTGGACAGC AGAAGCCAGT CATGTTAGGC 600 

ACTCACGCGC TCCTGCCGTT TGATGAATCA TCCGGTCTTT CGTATTGATC ACTAGTTCAC 660 

TACG CTGATA TAGCAAATTT TAAGATGTGA AACCACGAGA CGAGCGATAA ATCTTAGACG 720 

TTACCTATCC ATATGAAGCT TGTGCGAAAA AAAGGCGTGC CGCTGTAGCA TCATTCGTAT 780 
ACACTTTTGT CCCCAAAGAC AGGGATACGA ATCCATGCTC GACAGAACCC TCCCTTCCCT 840 
GCAGATAACG ACACTTAAGT ATAACAAAAG TAGTTGGATT ATTTCAGAAG CAAAATCTCA 900 
CTTTTCGCTG GCCTTTTTGT ACTTTGGTTA CTTGAGTTCA GACAGTGTAT GCTATATTGT 960 

CATGTGCTGC GTAAGGTTTA AATATGGTTC GACAAATATA TCAGTATATC ACTACTTTGT 1020 

TATGGGTGGG GCCTAGCACA AACTTGATAC AGCTAGGATA AAGTTAGAAC GATGACTGAT 10 80 

CTACTGTAAA GCGACACCTG TCCTGTTATG GTAGTTTAAG TCCATTCCTG GACGACTCCA 1140 

GATCCAGGAT ATGATGCTGT TACATAATGC GATTGTTCAC AATAAAATTG CATGATGTTC 1200 

TTCTACTCTT TAGGCAGTTT TGTTCAACAG GCAAGTTGCA TAATGCATGT GCATATATGA 1260 

GCAGCATAAT CATCAATTAA TCATAGGTTC GTCATTTTAG TTTCACTCCT TCACATTATT 1320 

CCAGCCCTTG AAGAAAAATG TAGCAGTGCT TGCTGTTTAA TAAGTGGCAG AGCTGTTTTC 1380 

ACTCCACCTA CGCTTGTCTA GGACCAAAAT TTTAATCTGT CACTTTGAGC TAAAACTGAA 1440 

GCACCAAACC GCTACAAAAG AACGTAGGAG CTGAATTGTA ACTTGATGGG ATTACTATAG 1500 

CAGTTGCTAC AGTTCTAGCT AGCTACCTTA TTCTATACGC ATCACC CTAA CAACCCGGCT 1560 

GACTGCTGCA TCTGACCCCA CCGTCCCCTG CTCCAAACCA ACTCTCCTTT CCTTGCATGC 1620 

ACTACACCCA CTTCCTGCAG CTATATATAC CACCATATGC CCATCTTATG AAACCATCCA 1680 

CAAGAGGAGA AGAAACAATC AACCAGCAAC ACTCTTCTCT TATAACATAG TACAGCGAAG 1740 

GAGATCCTGA CTGCTTTGTC AAGGTTCAAT TCTGCTTCCT CTGTTATGTT CTTTATATTA 1800 

CATGCTCTGA CAAAGCTATA AAGCTTGATA CTGCAGTATA ATATAACAAG TTAGCTACAC 1860 

AAGTTTTGTA CTTCAAGTCT TTTAACTATA TGTTGGTGCA ATAAGATTAT GAGTAATCCA 1920 

TATGAAGGTG TTGCAAGAGA ACATGAAAGG CAAAGATAAA CGGATGAACC CATTACTAGC 1980 

TTTGGCTGTA TCAGACCAAT AACTTGAAAT GCACTTGTGC TAGCATGCCT AAGTATTAGA 204 0 

AAAGGTAGCA TGGGAGAATC TATATTATTT TGGCTAACTT CTTTAGTTAC TATTGATTGA 2100 
TGAGAAAGCC TACCATTGCC CATGCCAGCC CTAATGTCCC GGTGACATGA TTGAGCCAGT 2160 
ACTATGATTA ATTTACTCTA TTGTTCTCCT TTTTTGAGTG CTGTATAAGA TGTCCTTTTT 2220 
TTGAGCCACT CGAGAAGATG TTTACTTAAC TCTAGTGCGC AATGATTGGA GCTCTCAGTG 2280 
CAACGCATGT GCTCTGTAAT CTACTGTCAC CACTACTCTG TAGTGTGTGC TTAAACTCTA 2340 
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AACTATTCCA CGTGGCTAGT AATTACCAAT CATTTACAAC ACTGTTACAT GTGTAGGGCT 2400 

GCGATCCATG GTCCGTCCTG TAGAAACCCC AACCCGTGAA ATCAAAAAAC TCGACGGCCT 2460 

GTGGGCATTC AGTCTGGATC GCGAAAACTG TGGAATTGAT CAGCGTTGGT GGGAAAGCGC 252 0 

GTTACAAGAA AGCCGGGCAA TTGCTGTGCC AGGCAGTTTT AACGATCAGT TCGCCGATGC 2580 

AGATATTCGT AATTATGCGG GCAACGTCTG GTATCAGCGC GAAGTCTTTA TACCGAAAGG 2640 

TTGGGCAGGC CAGCGTATCG TGCTGCGTTT CGATGCGGTC ACTCATTACG GCAAAGTGTG 2700 

GGTCAATAAT CAGGAAGTGA TGGAGCATCA GGGCGGCTAT ACGCCATTTG AAGCCGATGT 2760 

CACGCCGTAT GTTATTGCCG GGAAAAGTGT ACGTATCACC GTTTGTGTGA ACAACGAACT 2820 

GAACTGGCAG ACTATCCCGC CGGGAATGGT GATTACCGAC GAAAACGGCA AG AAAAAG C A 2 880 

GTCTTACTTC CATGATTTCT TTAACTATGC CGGAATCCAT CGCAGCGTAA TGCTCTACAC 2940 

CACGCCGAAC ACCTGGGTGG ACGATATCAC CGTGGTGACG CATGTCGCGC AAGACTGTAA 3000 

CCACGCGTCT GTTGACTGGC AGGTGGTGGC CAATGGTGAT GTCAGCGTTG AACTGCGTGA 3060 

TGCGGATCAA CAGGTGGTTG CAACTGGACA AGGCACTAGC GGGACTTTGC AAGTGGTGAA 312 0 

TCCGCACCTC TGGCAACCGG GTGAAGGTTA TCTCTATGAA CTGTGCGTCA CAGCCAAAAG 3180 

C C AG AC AG AG TGTGATATCT ACCCGCTTCG CGTCGGCATC CGGTCAGTGG CAGTGAAGGG 3240 

CGAACAGTTC CTGATTAACC ACAAACCGTT CTACTTTACT GGCTTTGGTC GTCATGAAGA 33 00 

TGCGGACTTA CGTGGCAAAG GATTCGATAA CGTGCTGATG GTGCACGACC ACGCATTAAT 3360 

GGACTGGATT GGGGCCAACT CCTACCGTAC CTCGCATTAC CCTTACGCTG AAGAGATGCT 342 0 

CGACTGGGCA GATGAACATG GCATCGTGGT GATTGATGAA ACTGCTGCTG TCGGCTTTAA 34 8 0 

CCTCTCTTTA GGCATTGGTT TCGAAGCGGG CAACAAGCCG AAAGAACTGT ACAGCGAAGA 3540 

GGCAGTCAAC GGGGAAACTC AGCAAGCGCA CTTACAGGCG ATTAAAGAGC TGATAGCGCG 3600 

TGACAAAAAC CACCCAAGCG TGGTGATGTG GAGTATTGCC AACGAACCGG ATACCCGTCC 3660 

GCAAGTGCAC GGGAATATTT CGCCACTGGC GGAAGCAACG CGTAAACTCG ACGCGACGCG 3720 

TCCGATCACC TGCGTCAATG TAATGTTCTG CGACGCTCAC ACCGATACCA TCAGCGATCT 3 78 0 

CTTTGATGTG CTGTGCCTGA ACCGTTATTA CGGATGGTAT GTCCAAAGCG GCGATTTGGA 384 0 

AACGGCAGAG AAGGTACTGG AAAAAGAACT TCTGGCCTGG CAGGAGAAAC TGCATCAGCC 3900 

GATTATCATC ACCGAATACG GCGTGGATAC GTTAGCCGGG CTGCACTCAA TGTACACCGA 3960 

CATGTGGAGT GAAGAGTATC AGTGTGCATG GCTGGATATG TATCACCGCG TCTTTGATCG 4020 

CGTCAGCGCC GTCGTCGGTG AACAGGTATG GAATTTCGCC GATTTTGCGA CCTCGCAAGG 4080 

CATATTGCGC GTTGGCGGTA ACAAGAAAGG GATCTTCACT CGCGACCGCA AACCGAAGTC 4140 

GGCGGCTTTT CTGCTGCAAA AACGCTGGAC TGGCATGAAC TTCGGTGAAA AACCGCAGCA 4200 

GGGAGGCAAA CAATGAATCA ACAACTCTCC TGGCGCACCA TCGTCGGCTA CAGCCTCGGG 4 260 

AATTGCTACC GAGCTTCTCG AGGGCACTGA AGTCGCTTGA TGTGCTGAAT TGTTTGTGAT 4320 

GTTGGTGGCG TATTTTGTTT AAATAAGTAA GCATGGCTGT GATTTTATCA TATGATCGAT 43 80 

CTTTGGGGTT TTATTTAACA CATTGTAAAA TGTGTATCTA TTAATAACTC AATGTATAAG 4440 

ATGTGTTCAT TCTTCGGTTG CCATAGATCT GCTTATTTGA CCTGTGATGT TTTGACTCCA 4500 

AAAACCAAAA TCACAACTCA ATAAACTCAT GGAATATGTC CACCTGTTTC TTGAAGAGTT 4560 

CATCTACCAT TCCAGTTGGC ATTTATCAGT GTTGCAGCGG CGCTGTGCTT TGTAACATAA 4620 
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CAATTGTTCA 
GGGGGCCCGG 
TTACAACGTC 
CCCCCTTTCG 
GCATGGGTGG 
CCAGCTATCT 
CATCATTGCG 
GATGGACCCC 
AAGCAAGTGG 
ACCTCCTCGG 
AAGGTGGCTC 
CTGCCGACAG 
ACGTTCCAAC 
ATGACGCACA 
ATTTGGAGAG 
CGGAGGAGCT 
CAGCACCACC 
CTGTCTCTTG 
CTTTGCTGCT 
GCATCTTGAT 
GGATCTGGGG 
GCCCGGCCGA 
AC CACTACAT 
GGACGGACGA 
GCGAGGTCGC 
CGGCCGAGTC 
TCTACACCCA 
TCGGGCTGCC 
GCATGCTGCG 
TGGACTTCAG 
ATCTGAGCTC 
ATCCTGTTGC 
TAATAATTAA 
CGCAATTATA 
TATCGCGCGC 
ACAACGTCGT 
CCCTTTCGCC 
GCGCAGCCTG 



CGGCATATAT 
TACCCAATTC 
GTGACTGGGA 
CCAGAAACGC 
AGACTTTTCA 
GTCACTTTAT 
ATAAAGGAAA 
CACCCACGAG 
ATTGATGTGA 
ATTCCATTGC 
CTACAAATGC 
TGGTCCCAAA 
CACGTCTTCA 
ATCCCACTAT 
AACACGGGGG 
GATATTTGGT 
AAGTCAGGGC 
ATCTGACTAA 
CCACACATGT 
GATTTAGCTT 
CCATTTGTTC 
CATCCGCCGT 
CGAGACAAGC 
CCTCGTCCGT 
CGGCATCGCC 
GACCGTGTAC 
CCTGCTGAAG 
CAACGACCCG 
GGCGGCCGGC 
CCTGCCGGTA 
GAATTTCCCC 
CGGTCTTGCG 
CATGTAATGC 
CATTTAATAC 
GGTGTCATCT 
GACTGGGAAA 
AGCTGGCGTA 
AATGGCGAAT 



CCAAATCTAG 
GCCCTATAGT 
AAACCCTGGC 
CCGGGCATTT 
ACAAAGGGTA 
TGTGAAGATA 
GGCCATCGTT 
GAGCATCGTG 
TCATCGATGG 
CCAGCTATCT 
CATCATTGCG 
GATGGACCCC 
AAGCAAGTGG 
CCTTCGCAAG 
ACTCTAGAGG 
GGACAAGCTG 
AATCCCCAGA 
TCTTGGTTTA 
CCATTCGAAT 
GACTATGCGA 
CAGGCACGGG 
GCCACCGAGG 
ACGGTCAACT 
CTGCGGGAGC 
TACGCGGGCC 
GTCTCCCCCC 
TCCCTGGAGG 
AGCGTGCGCA 
TTCAAGCACG 
CCGCCCCGTC 
GATCGTTCAA 
ATGATTATCA 
ATGACGTTAT 
GCGATAGAAA 
ATGTTACTAG 
ACCCTGGCGT 
ATAGCGAAGA 
GGCGCCTGAT 



AGAAGCTTAT 
GAGTCGTATT 
GTTACCCAAC 
AAATGGCGCG 
ATATCCGGAA 
GTGGAAAAGG 
GAAGATGCCT 
GAAAAAGAAG 
AGACTTTTCA 
GTCACTTTAT 
ATAAAGGAAA 
CACCCACGAG 
ATTGATGTGA 
ACCCTTCCTC 
ATCCAGCTGA 
TGGATAGGAG 
TCAAGTGCAA 
TGATTCGTTG 
TTTACCGTGT 
TTGCTTTCCT 
ATAAGCATTC 
CGGACATGCC 
TCCGTACCGA 
GCTATCCCTG 
CCTGGAAGGC 
GCCACCAGCG 
CACAGGGCTT 
TGCACGAGGC 
GGAACTGGCA 
CGGTCCTGCC 
ACATTTGGCA 
TATAATTTCT 
TTATGAGATG 
ACAAAATATA 
ATCGATCGGG 
TACCCAACTT 
GGCCCGCACC 
GCGGTATTTT 



CGATACCGTC 
ACAATTCACT 
TTAATCGCCT 
CCGCGATCGC 
ACCTCCTCGG 
AAGGTGGCTC 
CTGCCGACAG 
ACGTTCCAAC 
ACAAAGGGTA 
TGTGAAGATA 
GGCCATCGTT 
GAGCATCGTG 
TATCTCCACT 
TATATAAGGA 
AGGCTCGACA 
CAACCCTATC 
AGGTCCGCCT 
AGTAATTTTG 
TTAGCAAGGG 
GGACCCGTGC 
AGCCATGGCC 
GGCGGTCTGC 
GCCGCAGGAA 
GCTCGTCGCC 
ACGCAACGCC 
GACGGGACTG 
CAAGAGCGTG 
GCTCGGATAT 
TGACGTGGGT 
CGTCACCGAA 
ATAAAGTTTC 
GTTGAATTAC 
GGTTTTTATG 
GCGCGCAAAC 
AATTCACTGG 
AATCGCCTTG 
GATCGCCCTT 
CTCCTTACGC 



GACCTCGAGG 
GGCCGTCGTT 
TGCAGCACAT 
TTGCAGATCT 
ATTCCATTGC 
CTACAAATGC 
TGGTCCCAAA 
CACGTCTTCA 
ATATCCGGAA 
GTGGAAAAGG 
GAAGATGCCT 
GAAAAAGAAG 
GACGTAAGGG 
AGTTCATTTC 
AGGCAGTCCA 
CCTAATATAC 
TGTTTCTCCT 
GGGAAAGCTC 
CGAAAAGTTT 
AGCTGCGGAC 
CCAGAACGAC 
ACCATCGTCA 
CCGCAGGAGT 
GAGGTGGACG 
TACGACTGGA 
GGCTCCACGC 
GTCGCTGTCA 
GCCCCCCGCG 
TTCTGGCAGC 
ATCTGATGAG 
TTAAGATTGA 
GTTAAGCATG 
ATTAGAGTCC 
TAGGATAAAT 
CCGTCGTTTT 
CAGCACATCC 
CCCAACAGTT 
ATCTGTGCGG 



4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 
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6300 
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6420 
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6540 
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6900 
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TATTTCACAC 


CGCATATGGT 


GCACTCTCAG 


TACAATCTGC 


TCTGATGCCG 


CATAGTTAAG 


6960 


CCAGCCGCGA 


CACCCGCCAA 


CACCCGCTGA 


CGCGCCCTGA 


CGGGCTTGTC 


TGCTCCCGGC 


7020 


ATCCGCTTAC 


AGACAAGCTG 


TGACCGTCTC 


CGGGAGCTGC 


ATGTGTCAGA 


GGTTTTCACC 


7080 


GTCATCACCG 


AAACGCGCGA 


GACGAAAGGG 


CCTCGTGATA 


CGCCTATTTT 


TATAGGTTAA 


7140 


TGTCATGATA 


ATAATGGTTT 


CTTAGACGTC 


AGGTGGCACT 


TTTCGGGGAA 


ATGTGCGCGG 


7200 


AACCCCTATT 


TGTTTATTTT 


TCTAAATACA 


TTCAAATATG 


TATCCGCTCA 


TGAGACAATA 


7260 


ACCCTGATAA 


ATGCTTCAAT 


AATATTGAAA 


AAGGAAGAGT 


ATGAGTATTC 


AACATTTCCG 


7320 


TGTCGCCCTT 


ATTCCCTTTT 


TTGCGGCATT 


TTGCCTTCCT 


GTTTTTGCTC 


ACCCAGAAAC 


7380 


GCTGGTGAAA 


GTAAAAGATG 


CTGAAGATCA 


GTTGGGTGCA 


CGAGTGGGTT 


ACATCGAACT 


7440 


GGATCTCAAC 


AGCGGTAAGA 


TCCTTGAGAG 


TTTTCGCCCC 


GAAGAACGTT 


TTCCAATGAT 


7500 


GAGCACTTTT 


AAAGTTCTGC 


TATGTGGCGC 


GGTATTATCC 


CGTATTGACG 


CCGGGCAAGA 


7560 


GCAACTCGGT 


CGCCGCATAC 


ACTATTCTCA 


GAATGACTTG 


GTTGAGTACT 


CACCAGTCAC 


7620 


AGAAAAGCAT 


CTTACGGATG 


GCATGACAGT 


AAGAGAATTA 


TGCAGTGCTG 


CCATAACCAT 


7680 


GAGTGATAAC 


ACTGCGGCCA 


ACTTACTTCT 


GACAACGATC 


GGAGGACCGA 


AGGAGCTAAC 


7740 


CGCTTTTTTG 


CACAACATGG 


GGGATCATGT 


AACTCGCCTT 


GATCGTTGGG 


AACCGGAGCT 


7800 


GAATGAAGCC 


ATACCAAACG 


ACGAGCGTGA 


CACCACGATG 


CCTGTAGCAA 


TGGCAACAAC 


7860 


GTTGCGCAAA 


CTATTAACTG 


GCGAACTACT 


TACTCTAGCT 


TCCCGGCAAC 


AATTAATAGA 


7920 


CTGGATGGAG 


GCGGATAAAG 


TTGCAGGACC 


ACTTCTGCGC 


TCGGCCCTTC 


CGGCTGGCTG 


7980 


GTTTATTGCT 


GATAAATCTG 


GAGCCGGTGA 


GCGTGGGTCT 


CGCGGTATCA 


TTGCAGCACT 


8040 


GGGGCCAGAT 


GGTAAGCCCT 


CCCGTATCGT 


AGTTATCTAC 


ACGACGGGGA 


GTCAGGCAAC 


8100 


TATGGATGAA 


CGAAATAGAC 


AGATCGCTGA 


GATAGGTGCC 


TCACTGATTA 


AGCATTGGTA 


8160 


ACTGTCAGAC 


CAAGTTTACT 


CATATATACT 


TTAGATTGAT 


TTAAAACTTC 


ATTTTTAATT 


8220 


TAAAAGGATC 


TAGGTGAAGA 


TCCTTTTTGA 


TAATCTCATG 


ACCAAAATCC 


CTTAACGTGA 


8280 


GTTTTCGTTC 


CACTGAGCGT 


CAGACCCCGT 


AGAAAAGATC 


AAAGGATCTT 


CTTGAGATCC 


8340 


TTTTTTTCTG 


CGCGTAATCT 


GCTGCTTGCA 


AACAAAAAAA 


CCACCGCTAC 


CAGCGGTGGT 


8400 


TTGTTTGCCG 


GATCAAGAGC 


TACCAACTCT 


TTTTCCGAAG 


GTAACTGGCT 


TCAGCAGAGC 


8460 


GCAGATACCA 


AATACTGTCC 


TTCTAGTGTA 


GCCGTAGTTA 


GGCCACCACT 


TCAAGAACTC 


8520 


TGTAGCACCG 


CCTACATACC 


TCGCTCTGCT 


AATCCTGTTA 


CCAGTGGCTG 


CTGCCAGTGG 


8580 


CGATAAGTCG 


TGTCTTACCG 


GGTTGGACTC 


AAGACGATAG 


TTACCGGATA 


AGGCGCAGCG 


8640 


GTCGGGCTGA 


ACGGGGGGTT 


CGTGCACACA 


GCCCAGCTTG 


GAGCGAACGA 


CCTACACCGA 


8700 


ACTGAGATAC 


CTACAGCGTG 


AGCATTGAGA 


AAGCGCCACG 


CTTCCCGAAG 


GGAGAAAGGC 


8760 


GGACAGGTAT 


CCGGTAAGCG 


GCAGGGTCGG 


AACAGGAGAG 


CGCACGAGGG 


AGCTTCCAGG 


8820 


GGGAAACGCC 


TGGTATCTTT 


ATAGTCCTGT 


CGGGTTTCGC 


CACCTCTGAC 


TTGAGCGTCG 


8880 


ATTTTTGTGA 


TGCTCGTCAG 


GGGGGCGGAG 


CCTATGGAAA 


AACGCCAGCA 


ACGCGGCCTT 


8940 


TTTACGGTTC 


CTGGCCTTTT 


GCTGGCCTTT 


TGCTCACATG 


TTCTTTCCTG 


CGTTATCCCC 


9000 


TGATTCTGTG 


GATAACCGTA 


TTACCGCCTT 


TGAGTGAGCT 


GATACCGCTC 


GCCGCAGCCG 


9060 


AACGACCGAG 


CGCAGCGAGT 


CAGTGAGCGA 


GGAAGCGGAA 


GAGCGCCCAA 


TACGCAAACC 


9120 


GCCTCTCCCC 


GCGCGTTGGC 


CGATTCATTA 


ATGCAGCTGG 


CACGACAGGT 


TTCCCGACTG 


9180 
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GAAAGCGGGC AGTGAGCGCA ACGCAATTAA TGTGAGTTAG CTCACTCATT AGGCACCCCA 9240 _ 
GGCTTTACAC TTTATGCTTC CGGCTCGTAT GTTGTGTGGA ATTGTGAGCG GATAACAATT 93 00 
TCACACAGGA AACAGCTATG ACCATGATTA CGCCA 9335 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

GGGGGATCCT CTAGACAATG ATATACATAG ATAAAAACC 39 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

GGGAGATCTC CTTCGCTGTA CTATGTTATA AGAGAAGAG 39 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

GGGGGATCCT GACTGCTTTG TCAAGGTTCA ATTCTGCTT 39 
(2) INFORMATION FOR SEQ ID NO: 23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 9. base pairs 

(B) . TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

-122- 



BNSDOCID <WO 9856921 A1 I > 



WO 98/56921 PCT/US98/11921 

GGGCCATGGA TCGCAGCCCT ACACATGTAA CAGTGTTGT _ 3 9 

(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 6 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

AAAGAGCTCT GAGGGCACTG AAGTCGCTTG ATGTGC 3 6 

(2) INFORMATION FOR SEQ ID NO: 25: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

GGGGAATTCT TGGATATATG CCGTGAACAA TTGTTATGTT AC 42 
(2) INFORMATION FOR SEQ ID NO: 26: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5897 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
*(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 



AGCTTGCATG 


CCTGCAGATC 


TGCATGGGTG 


GAGACTTTTC 


AACAAAGGGT 


AATATCCGGA 


60 


AACCTCCTCG 


GATTCCATTG 


CCCAGCTATC 


TGTCACTTTA 


TTGTGAAGAT 


AGTGGAAAAG 


120 


GAAGGTGGCT 


CCTACAAATG 


CCATCATTGC 


GATAAAGGAA 


AGGCCATCGT 


TGAAGATGCC 


180 


TCTGCCGACA 


GTGGTCCCAA 


AGATGGACCC 


CCACCCACGA 


GGAGCATCGT 


GGAAAAAGAA 


240 


GACGTTCCAA 


CCACGTCTTC 


AAAGCAAGTG 


GATTGATGTG 


ATCATCGATG 


GAGACTTTTC 


300 


AACAAAGGGT 


AATATCCGGA 


AACCTCCTCG 


GATTCCATTG 


CCCAGCTATC 


TGTCACTTTA 


360 


TTGTGAAGAT 


AGTGGAAAAG 


GAAGGTGGCT 


CCTACAAATG 


CCATCATTGC 


GATAAAGGAA 


420 


AGGCCATCGT 


TGAAGATGCC 


TCTGCCGACA 


GTGGTCCCAA 


AGATGGACCC 


CCACCCACGA 


480 


GGAGCATCGT 


GGAAAAAGAA 


GACGTTCCAA 


CCACGTCTTC 


AAAGCAAGTG 


GATTGATGTG 


540 


ATATCTCCAC 


TGACGTAAGG 


GATGACGCAC 


AATCCCACTA 


TCCTTCGCAA 


GACCCTTCCT 


600 


CTATATAAGG 


AAGTTCATTT 


CATTTGGAGA 


GAACACGGGG 


GACTCTAGAG 


GATCCAGCTG 


660 
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AAGGCTCGAC AAGGCAGTCC ACGGAGGAGC TGATATTTGG TGGACAAGCT GTGGATAGGA 720 _ 

GCAACCCTAT CCCTAATATA CCAGCACCAC CAAGTCAGGG CAATCCCCAG ATCAAGTGCA 780 

AAGGTCCGCC TTGTTTCTCC TCTGTCTCTT GATCTGACTA ATCTTGGTTT ATGATTCGTT 840 

GAGTAATTTT GGGGAAAGCT CCTTTGCTGC TCCACACATG TCCATTCGAA TTTTACCGTG 900 

TTTAGCAAGG GCGAAAAGTT TGCATCTTGA TGATTTAGCT TGACTATGCG ATTGCTTTCC 960 

TGGACCCGTG CAGCTGCGGA CGGATCTGGG GCCATTTGTT CCAGGCACGG GATAAGCATT 102 0 

CAGCCATGGT CCGTCCTGTA GAAACCCCAA CCCGTGAAAT CAAAAAACTC GACGGCCTGT 1080 

- GGGCATTCAG TCTGGATCGC GAAAACTGTG GAATTGATCA GCGTTGGTGG GAAAGCGCGT 1140 

TACAAGAAAG CCGGGCAATT GCTGTGCCAG GCAGTTTTAA CGATCAGTTC GCCGATGCAG 1200 

ATATTCGTAA TTATGCGGGC AACGTCTGGT ATCAGCGCGA AGTCTTTATA CCGAAAGGTT 1260 

GGGCAGGCCA GCGTATCGTG CTGCGTTTCG ATGCGGTCAC TCATTACGGC AAAGTGTGGG 1320 

TCAATAATCA GGAAGTGATG GAGCATCAGG GCGGCTATAC GCCATTTGAA GCCGATGTCA 1380 

CGCCGTATGT TATTGCCGGG AAAAGTGTAC GTATCACCGT TTGTGTGAAC AACGAACTGA 1440 

ACTGGCAGAC TATCCCGCCG GGAATGGTGA TTACCGACGA AAACGGCAAG AAAAAGCAGT 1500 

CTTACTTCCA TGATTTCTTT AACTATGCCG GAATCCATCG CAGCGTAATG CTCTACACCA 1560 

CGCCGAACAC CTGGGTGGAC GATATCACCG TGGTGACGCA TGTCGCGCAA GACTGTAACC 1620 

ACGCGTCTGT TGACTGGCAG GTGGTGGCCA ATGGTGATGT CAGCGTTGAA CTGCGTGATG 1680 

CGGATCAACA GGTGGTTGCA ACTGGACAAG GCACTAGCGG GACTTTGCAA GTGGTGAATC 1740 

CGCACCTCTG GCAACCGGGT GAAGGTTATC TCTATGAACT GTGCGTCACA GCCAAAAGCC 1800 

AGACAGAGTG TGATATCTAC CCGCTTCGCG TCGGCATCCG GTCAGTGGCA GTGAAGGGCG 1860 

AACAGTTCCT GATTAACCAC AAACCGTTCT ACTTTACTGG CTTTGGTCGT CATGAAGATG 1920 

CGGACTTACG TGG CAAAGGA TTCGATAACG TGCTGATGGT GCACGACCAC GCATTAATGG 1980 

ACTGGATTGG GGCCAACTCC TACCGTACCT CGCATTACCC TTACGCTGAA GAGATGCTCG 2040 

ACTGGGCAGA TGAACATGGC ATCGTGGTGA TTGATGAAAC TGCTGCTGTC GGCTTTAACC 2100 

TCTCTTTAGG CATTGGTTTC GAAGCGGGCA ACAAGCCGAA AGAACTGTAC AGCGAAGAGG 2160 

CAGTCAACGG GGAAACTCAG CAAGCGCACT TACAGGCGAT TAAAGAGCTG ATAGCGCGTG 2220 

ACAAAAACCA CCCAAGCGTG GTGATGTGGA GTATTGCCAA CGAACCGGAT ACCCGTCCGC 2280 

AAGTGCACGG GAATATTTCG CCACTGGCGG AAGCAACGCG TAAACTCGAC CCGACGCGTC 2340 

CGATCACCTG CGTCAATGTA ATGTTCTGCG ACGCTCACAC CGATACCATC AGCGATCTCT 2400 

TTGATGTGCT GTGCCTGAAC CGTTATTACG GATGGTATGT CCAAAGCGGC GATTTGGAAA 2460 

CGGCAGAGAA GGTACTGGAA AAAGAACTTC TGGCCTGGCA GGAGAAACTG CATCAGCCGA 2520 

TTATCATCAC CGAATACGGC GTGGATACGT TAGCCGGGCT GCACTCAATG TACACCGACA 2580 

TGTGGAGTGA AGAGTATCAG TGTGCATGGC TGGATATGTA TCACCGCGTC TTTGATCGCG 2640 

TCAGCGCCGT CGTCGGTGAA CAGGTATGGA ATTTCGCCGA TTTTGCGACC TCGCAAGGCA 2700 

TATTGCGCGT TGGCGGTAAC AAGAAAGGGA TCTTCACTCG CGACCGCAAA CCGAAGTCGG 2760 

CGGCTTTTCT GCTGCAAAAA CGCTGGACTG GCATGAACTT CGGTGAAAAA CCGCAGCAGG 2820 

GAGGCAAACA ATGAATCAAC AACTCTCCTG GCGCACCATC GTCGGCTACA GCCTCGGTGG 2880 

GGAATTGGAG AGCTCTGAGG GCACTGAAGT CGCTTGATGT GCTGAATTGT TTGTGATGTT 2940 
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GGTGGCGTAT 


TTTGTTTAAA 


TAAGTAAGCA 


TGGCTGTGAT 


TTTATCATAT 


GATCGATCTT 


3000 


TGGGGTTTTA 


TTTAACACAT 


TGTAAAATGT 


GTATCTATTA 


ATAACTCAAT 


GTATAAGATG 


3060 


TGTTCATTCT 


TCGGTTGCCA 


TAGATCTGCT 


TATTTGACCT 


GTGATGTTTT 


GACTCCAAAA 


3120 


ACCAAAATCA 


CAACTCAATA 


AACTCATGGA 


ATATGTCCAC 


CTGTTTCTTG 


AAGAGTTCAT 


3180 


CTACCATTCC 


AGTTGGCATT 


TATCAGTGTT 


GCAGCGGCGC 


TGTGCTTTGT 


AACATAACAA 


3240 


TTGTTCACGG 


CATATATCCA 


AGAATTCACT 


GGCCGTCGTT 


TTACAACGTC 


GTGACTGGGA 


3300 


AAACCCTGGC 


GTTACCCAAC 


TTAATCGCCT 


TGCAGCACAT CCCCCTTTCG 


CGAGCTGGCG 


3360 


TAATAGCGAA 


GAGGCCCGCA 


CCGATCGCCC 


TTCCCAACAG 


TTGCGCAGCC 


TGAATGGCGA 


3420 


ATGGCGCCTG 


ATGCGGTATT 


TTCTCCTTAC 


GCATCTGTGC 


GGTATTTCAC 


ACCGCATATG 


3480 


GTGCACTCTC 


AGTACAATCT 


GCTCTGATGC 


CGCATAGTTA 


AGCCAGCCCC 


GACACCCGCC 


3540 


AACACCCGCT 


GACGCGCCCT 


GACGGGCTTG 


TCTGCTCCCG 


GCATCCGCTT 


ACAGACAAGC 


3600 


TGTGACCGTC 


TCCGGGAGCT 


GCATGTGTCA 


GAGGTTTTCA 


CCGTCATCAC 


CGAAACGCGC 


3660 


GAGACGAAAG 


GGCCTCGTGA 


TACGCCTATT 


TTTATAGGTT 


AATGTCATGA 


TAATAATGGT 


3720 


TTCTTAGACG 


TCAGGTGGCA 


CTTTTCGGGG 


AAATGTGCGC 


GGAACCCCTA 


TTTGTTTATT 


3780 


TTTCTAAATA 


CATTCAAATA 


TGTATCCGCT 


CATGAGACAA 


TAACCCTGAT 


AAATGCTTCA 


3840 


ATAATATTGA 


AAAAGGAAGA 


GTATGAGTAT 


TCAACATTTC 


CGTGTCGCCC 


TTATTCCCTT 


3900 


TTTTGCGGCA 


TTTTGCCTTC 


CTGTTTTTGC 


TCACCCAGAA 


ACGCTGGTGA 


AAGTAAAAGA 


3960 


TGCTGAAGAT 


CAGTTGGGTG 


CACGAGTGGG 


TTACATCGAA 


CTGGATCTCA 


ACAGCGGTAA 


. 4020 


GATCCTTGAG 


AGTTTTCGCC 


CCGAAGAACG 


TTTTCCAATG 


ATGAGCACTT 


TTAAAGTTCT 


4080 


GCTATGTGGC 


GCGGTATTAT 


CCCGTATTGA 


CGCCGGGCAA 


GAGCAACTCG 


GTCGCCGCAT 


4140 


ACACTATTCT 


CAGAATGACT 


TGGTTGAGTA 


CTCACCAGTC 


ACAGAAAAGC 


ATCTTACGGA 


4200 


TGGCATGACA 


GTAAGAGAAT 


TATGCAGTGC 


TGCCATAACC 


ATGAGTGATA 


ACACTGCGGC ' 


4260 


CAACTTACTT 


CTGACAACGA 


TCGGAGGACC 


GAAGGAGCTA 


ACCGCTTTTT 


TGCACAACAT 


4320 


GGGGGATCAT 


GTAACTCGCC 


TTGATCGTTG 


GGAACCGGAG 


CTGAATGAAG 


CCATACCAAA 


4380 


CGACGAGCGT 


GACACCACGA 


TGCCTGTAGC 


AATGGCAACA 


ACGTTGCGCA 


AACTATTAAC 


4440 


TGGCGAACTA 


CTTACTCTAG 


CTTCCCGGCA 


ACAATTAATA 


GACTGGATGG 


AGGCGGATAA 


4500 


AGTTGCAGGA 


CCACTTCTGC 


GCTCGGCCCT 


TCCGGCTGGC 


TGGTTTATTG 


CTGATAAATC 


4560 


TGGAGCCGGT 


GAGCGTGGGT 


CTCGCGGTAT 


CATTGCAGCA 


CTGGGGCCAG 


ATGGTAAGCC 


4620 


CTCCCGTATC 


GTAGTTATCT 


ACACGACGGG 


GAGTCAGGCA 


ACTATGGATG 


AACGAAATAG 


4680 


ACAGATCGCT 


GAGATAGGTG 


CCTCACTGAT 


TAAGCATTGG 


TAACTGTCAG 


ACCAAGTTTA 


4740 


CTCATATATA 


CTTTAGATTG 


ATTTAAAACT 


TCATTTTTAA 


TTTAAAAGGA 


TCTAGGTGAA 


4800 


GATCCTTTTT 


GATAATCTCA 


TGACCAAAAT 


CCCTTAACGT 


GAGTTTTCGT 


TCCACTGAGC 


4860 


GTCAGACCCC 


GTAGAAAAGA 


TCAAAGGATC 


TTCTTGAGAT 


CCTTTTTTTC 


TGCGCGTAAT 


4920 


CTGCTGCTTG 


CAAACAAAAA 


AACCACCGCT 


ACCAGCGGTG 


GTTTGTTTGC 


CGGATCAAGA 


4980 


GCTACCAACT 


CTTTTTCCGA 


AGGTAACTGG 


CTTCAGCAGA 


GCGCAGATAC 


CAAATACTGT 


5040 


CCTTCTAGTG 


TAGCCGTAGT 


TAGGCCACCA 


CTTCAAGAAC 


TCTGTAGCAC 


CGCCTACATA 


5100 


CCTCGCTCTG 


CTAATCCTGT 


TACCAGTGGC 


TGCTGCCAGT 


GGCGATAAGT 


CGTGTCTTAC 


5160 


CGGGTTGGAC 


TCAAGACGAT 


AGTTACCGGA 


TAAGGCGCAG 


CGGTCGGGCT 


GAACGGGGGG 


5220 
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TTCGTGCACA 


CAGCCCAGCT 


TGGAGCGAAC GACCTACACC GAACTGAGAT ACCTACAGCG 


5280 


TGAGCATTGA 


GAAAGCGCCA 


CGCTTCCCGA 


AGGGAGAAAG 


GCGGACAGGT 


ATCCGGTAAG 


5340 


CGGCAGGGTC 


GGAACAGGAG 


AGCGCACGAG 


GGAGCTTCCA 


GGGGGAAACG 


CCTGGTATCT 


5400 


TTATAGTCCT 


GTCGGGTTTC 


GCCACCTCTG 


ACTTGAGCGT 


CGATTTTTGT 


GATGCTCGTC 


5460 


AGGGGGGCGG 


AGCCTATGGA 


AAAACGCCAG 


CAACGCGGCC 


TTTTTACGGT 


TCCTGGCCTT 


5520 


TTGCTGGCCT 


TTTGCTCACA 


TGTTCTTTCC 


TGCGTTATCC 


CCTGATTCTG 


TGGATAACCG 


5.580 


TATTACCGCC 


TTTGAGTGAG 


CTGATACCGC 


TCGCCGCAGC 


CGAACGACCG 


AGCGCAGCGA 


5640 


GTCAGTGALj(_ 




AAGAGCGCCC 


AATACGCAAA 


CCGCCTCTCC 


CCGCGCGTTG 


5700 


GCCGATTCAT 


TAATGCAGCT 


GGCACGACAG 


GTTTCCCGAC 


TGGAAAGCGG 


GCAGTGAGCG 


5760 


CAACGCAATT 


AATGTGAGTT 


AGCTCACTCA 


TTAGGCACCC 


CAGGCTTTAC 


ACTTTATGCT 


5820 


TCCGGCTCGT 


ATGTTGTGTG 


GAATTGTGAG 


CGGATAACAA 


TTTCACACAG 


GAAACAGCTA 


5880 


TGACCATGAT 


TACGCCA 










5897 



(2) INFORMATION FOR SEQ ID NO: 27: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6898 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 
(ii) MOLECULE TYPE : DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 








AGCTTGCATG 


CCTGCAGTGC 


AGCGTGACCC 


GGTCGTGCCC 


CTCTCTAGAG 


ATAATGAGCA 


60 


TTGCATGTCT 


AAGTTATAAA AAATTACCAC 


ATATTTTTTT 


TGTCACACTT 


GTTTGAAGTG 


120 


CAGTTTATCT 


ATCTTTATAC 


ATATATTTAA 


ACTTTAATCT 


ACGAATAATA 


TAATCTATAG 


180 


TACTACAATA 


ATATCAGTGT 


TTTAGAGAAT 


CATATAAATG 


AACAGTTAGA 


CATGGTCTAA 


240 


AGGACAATTG 


AGTATTTTGA 


CAACAGGACT 


CTACAGTTTT 


ATCTTTTTAG 


TGTGCATGTG 


300 


TTCTCCTTTT 


TTTTTGCAAA 


TAGCTTCACC 


TATATAATAC 


TTCATCCATT 


TTATTAGTAC 


360 


ATCCATTTAG 


GGTTTAGGGT 


TAATGGTTTT 


TATAGACTAA 


TTTTTTTAGT 


ACATCTATTT 


420 


TATTCTATTT 


TAGCCTCTAA 


ATTAAGAAAA 


CTAAAACTCT 


ATTTTAGTTT 


TTTTATTTAA 


480 


TAATTTAGAT 


ATAAAATAGA 


ATAAAATAAA 


GTGACTAAAA 


ATTAAACAAA 


TACCCTTTAA 


540 


GAAATTAAAA 


AAACTAAGGA AACATTTTTC 


TTGTTTCGAG 


TAGATAATGC 


CAGCCTGTTA 


600 


AACGCCGTCG 


ACGAGTCTAA 


CGGACACCAA 


CCAGCGAACC 


AGCAGCGTCG 


CGTCGGGCCA 


660 


AGCGAAGCAG 


ACGGCACGGC 


ATCTCTGTCG 


CTGCCTCTGG 


ACCCCTCTCG 


AGAGTTCCGC 


720 


TCCACCGTTG 


GACTTGCTCC 


GCTGTCGGCA 


TCCAGAAATT 


GCGTGGCGGA 


GCGGCAGACG 


780 


TGAGCCGGCA 


CGGCAGGCGG 


CCTCCTCCTC 


CTCTCACGGC 


ACGGCAGCTA 


CGGGGGATTC 


840 


CTTTCCCACC 


GCTCCTTCGC 


TTTCCCTTCC 


TCGCCCGCCG 


TAATAAATAG 


ACACCCCCTC 


900 


CACACCCTCT 


TTCCCCAACC 


TCGTGTTGTT 


CGGAGCGCAC 


ACACACACAA 


CCAGATCTCC 


960 


CCCAAATCCA 


CCCGTCGGCA 


CCTCCGCTTC 


AAGGTACGCC 


GCTCGTCCTC 


CCCCCCCCCC 


1020 


CCTCTCTACC 


TTCTCTAGAT 


CGGCGTTCCG 


GTCCATGCAT 


GGTTAGGGCC 


CGGTAGTTCT 


1080 
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ACTTCTGTTC 




TAGATCCGTG 


TTTGTGTTAG 


A1CCGTGC1G 


G X AGJUG 1 X GG 


X 14 U 


TACACGGATG 


CGACCTGTAC 


^*imp»TV/'"«7\p'^/'~'P< 

G T C AG AC AC G 


1 1CXGAX X*jv_ 


rpTV tv PTTnrr Zi 


OXoX llv.lL! 


X Z vj u 


TTGGGGAATC 


m^^ /■! TV m.^*l /"t ^"1 

CTGGGATGGC 


TCTAGCCGTT 


GGGGAGAGGG 


uAlLuAl lit 


a X brt X X X X X X 


xz o u 


TTGTTTCGTT 


GCATAGGGTT 


TGGTxIGCCC 


ill ICC x 1 1 A 


111 LA/i 


1 UL X Uv»Hv, 


lJ^U 


TTGTTTGTCG 


GGTCATCTTT 


TCATGCTTTT 


TTTTGTCTTG 


G 1 1 G 1 G A 1 G A 


m /l rp f% rrr ^ m r%T* 

X G X GG 1 G 1 GG 


nan 

1 J o u 


TTGGGCGGTC 


GTTCTAGATC 


nf~t TV /-i m TV /"i TV TV m 

GGAGTAGAAT 


TCTGTTTCAA 


AGIAGGIGGI 


GGATTTATTA 


144U 


ATTTTGGATC 


TGTATGTGTG 


TGCCATACAT 


ATTCATAGTT 


TV TV TV 1 1 III TV TV 

ACGAATTGAA 


P« A T>P A W P7AT 1 

GAIGAIGGAI 


IbUU 


GGAAATATCG 


ATCTAGGATA 


GGTATACATG 


mil iw tv m/i/ a i/Tnn 

TTGATGCGGG 


TTTTACTGAT 


G CATAT ACAG 


1560 


AGATGCTTTT 


TGTTCGCTTG 


GTTGTGATGA 


TGTGGTGTGG 


TTGGGCGGTC 


GTTCATTCGT 




TCTAGATCGG 


AGTAGAATAC 


TGTTTCAAAC 


TACCTGGTGT 


ATTTATTAAT 


mrpnri/-»p» tv tv pmn 
TTTGGAACTG 


-T /- Q A 


TATGTGTGTG 


TCATACATCT 


TCATAGTTAC 


GAGTTTAAGA 


ITI/l/^THT/*»mV 7V TV 

TGGATGGAAA 


rn tv rpnp tv TPiP7\ 

TATCGATCTA 


174 0 


GGATAGGTAT 


ACATGTTGAT 


GTGGGTTTTA 


CTGATGCATA 


m 7V tv mo tv m 

TACATGATGG 


C A x A X G G AGC 


T O A Pj 

180 0 


ATCTATTCAT 


ATGCTCTAAC 


CTTGAGTACC 


TATCTATTAT 


AATAAACAAG 


TATGTTTTAT 


loo U 


AATTATTTTG 


ATCTTGATAT 


ACTTGGATGA 


TGGCATATGC 


tv Tv ^^^n TV m TV 

AG CAG CT AT A 


TGTGGATTTT 


192 0 


TTTAGCCCTG 


CCTTCATACG 


CTATTTATTT 


GCTTGGTACT 


GTTTCTTTTG 


TCGATGCTCA 


1980 


CCCTGTTGTT 


TGGTGTTACT 


TCTGCAGGGT 


ACCCCCGGGG 


T CG AC CATGG 


TCGGTCCTGT 


2040 


AGAAACCCCA 


ACCCGTGAAA 


TCAAAAAACT 


CGACGGCCTG 


TGGGGA1 1GA 


G xCTGGATCG 


o "i r\ p 
21UU 


CGAAAACTGT 


GGAATTGATC 


AGCGTTGGTG 


GGAAAGCGCG 


rprp T\ TV 7\ y^i Tv 7\ tv 

TTACAAGAAA 


GGGGGGGAA1 


zioU 


TGCTGTGCCA 


GGCAGTTTTA 


tv /T/*» t\ nr/"*TV ^ mm 

ACGATCAGTT 


CGCCGATGCA 


GATATTCGTA 


Al X AIGGGGG 


*5 O *5 fi 
Z ZZ U 


CAACGTCTGG 


TATCAGCGCG 


AAGTCTTTAT 


TV /—i t\ tv 7\ /~i /"^m 

ACCGAAAGGT 


1 GGGGAGGGG 


AGGGXAXGGX 


o o p n 


GCTGCGTTTC 


GATGCGGTCA 


CTCATTACGG 


/t tv tv 7A /~»m/~»rri/™' O 

CAAAGTGTGG 


/~im07\ tv m tv tv m/^ 

GTGAATAA1 G 


A PIP 1 A APTPS T 
AGGAAG X G A 1 


1 1 a n 

Z J4 U 


GGAGCATCAG 


GGCGGCTATA 


CGCCATTTGA 


AGCCGATGTC 


ACGCCGTATG 


mm T\ TTPPPPP 

Tx AT I GGGGG 


O A Pi Pi 
Z4UU 


GAAAAGTGTA 


CGTATCACCG 


TTTGTGTGAA 


CAACGAACTG 


Tv tv nmn pnii p»T\ 

AAC1 GGG AGA 


P" m A TPPPP P , P' 

G xAlCCCGCC 


Z 4 o U 


GGGAATGGTG 


ATTACCGACG 


AAAACGGCAA 


*-» TV TV TV TV TV /I TV 

GAAAAAGCAG 


TCTTACTTCC 


ATGATTTCTT 


*5 tr o n 
z oZ u 


TAACTATGCC 


GGAATCCATC 


GCAGCGTAAT 


GCTCTACACC 


ACGCCGAAGA 


GG IGGGxGGA 




CGATATCACC 


GTGGTGACGC 


ATGTCGCGCA 


Tt. /~i tv r**ifn/*trn TV TV /™t 

AGACTGTAAC 


CACGCGTCTG 


rprp/^ TV prppip* P» A 

TTG AG TGG G A 


o /r a pt 
Z o4 U 


GGTGGTGGCC 


AATGGTGATG 


TCAGCGTTGA 


tv nmnonmn TV rrr 

ACTGCGTGAT 


G CGG AT CAAG 


tv pir^ r r , p , p* r P r n^p' 
AGGiGGl XGG 


*D T Pi n 
Z / U U 


AACTGGACAA 


GGCACTAGCG 


TV /'II III 1 IH 1/1 /"I TV 

GGACTTTGCA 


TV /Tmnnrp/i tv TV T 1 

AGTGGTGAAT 


rtf~>f~y /"» tv prim P'T 1 
CGGGAGG 1 G i 


G G G AAG G GGG 


Z / o U 


TGAAGGTTAT 


CTCTATGAAC 


TGTGCGTCAC 


Tin/^OTi TV TV TV /T f~* 

AGCCAAAAGC 


CAGAC AG AG 1 


PTPATATPTA 

GIGAXA1G XA 


z oz u 


CCCGCTTCGC 


GTCGGCATCC 


GGTCAGTGGC 


Tv rirjy/^ 7v TV P^P*P* P 1 

AGTGAAGGGG 


GAACAG 1 1 GG 


TP A TT A A. pp A 


9 a fi n 
z o o u 


CAAACCGTTC 


TACTTTACTG 


GC T xTGGTCG 


TCAx gaaga 1 


GGGGAG 1 1 AG 


pTpnpaa a rip! 


Z -7 *± vj 


tv mmnA 7k mTV tv 

ATTCGATAAC 


GTGCTGATGG 


«"P/"'P< A P»P< A P'P* A 

x G GAG GAG C A 


GGGAI lAAiG 


GAG 1 GVjA 1 X v? 




3000 


CTACCGTACC 


TCGCATTACC 


/-i mm tv /-i/-* r*TT\r% tv 

G 1 xACGC 1GA 


T\ f"1 TV /"I TV fn/T<-»mr« 

AGAGA1 GC 1 G 


GAG luooLAb 




JUOU 


CATCGTGGTG 


ATTGATGAAA 


CTGCTGCTGT 


CGGCTTTAAC 


CTCTCTTTAG 


GCATTGGTTT 


3120 


CGAAGCGGGC 


AACAAGCCGA 


AAGAACTGTA 


CAGCGAAGAG 


GCAGTCAACG 


GGGAAACTCA 


3180 


GCAAGCGCAC 


TTACAGGCGA 


TTAAAGAGCT 


GATAGCGCGT 


GACAAAAACC 


ACCCAAGCGT 


3240 


GGTGATGTGG 


AGTATTGCCA 


ACGAACCGGA 


TACCCGTCCG 


CAAGTGCACG 


GGAATATTTC 


3300 


GCCACTGGCG 


GAAGCAACGC 


GTAAACTCGA 


CCCGACGCGT 


CCGATCACCT 


GCGTCAATGT 


3360 
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AATGTTCTGC 
CCGTTATTAC 
AAAAGAACTT 
CGTGGATACG 
GTGTGCATGG 
ACAGGTATGG 
CAAGAAAGGG 
ACGCTGGACT 
CAACTCTCCT 
GGCACTGAAG 
ATAAGTAAGC 
TTGTAAAATG 
ATAGATCTGC 
AAACTCATGG 
TTATCAGTGT 
AAGAATTCAC 
CTTAATCGCC 
ACCGATCGCC 
TTTCTCCTTA 
TGCTCTGATG 
TGACGGGCTT 
TGCATGTGTC 
ATACGCCTAT 
ACTTTTCGGG 
ATGTATCCGC 
AGTATGAGTA 
CCTGTTTTTG 
GCACGAGTGG 
CCCGAAGAAC 
TCCCGTATTG 
TTGGTTGAGT 
TTATGCAGTG 
ATCGGAGGAC 
CTTGATCGTT 
ATGCCTGTAG 
GCTTCCCGGC 
CGCTCGGCCC 
TCTCGCGGTA 



GACGCTCACA 

GGATGGTATG 

CTGGCCTGGC 

TTAGCCGGGC 

CTGGATATGT 

AATTTCGCCG 

ATCTTCACTC 

GGCATGAACT 

GGCGCACCAT 

TCGCTTGATG 

ATGGCTGTGA 

TGTATCTATT 
TTATTTGACC 

AATATGTCCA 
TGCAGCGGCG 
TGGCCGTCGT 
TTGCAGCACA 
CTTCCCAACA 
CGCATCTGTG 
CCGCATAGTT 
GTCTGCTCCC 
AGAGGTTTTC 
TTTTATAGGT 
GAAATGTGCG 
TCATGAGACA 
TTCAACATTT 
CTCACCCAGA 
GTTACATCGA 
GTTTTCCAAT 
ACGCCGGGCA 
ACTCACCAGT 
CTGCCATAAC 
CGAAGGAGCT 
GGGAACCGGA 
CAATGGCAAC 
AACAATTAAT 
TTCCGGCTGG 
TCATTGCAGC 



CCGATACCAT 
TCCAAAGCGG 
AGGAGAAACT 
TGCACTCAAT 
ATCACCGCGT 
ATTTTGCGAC 
GCGACCGCAA 
TCGGTGAAAA 
CGTCGGCTAC 
TGCTGAATTG 
TTTTATCATA 
AATAACTCAA 
TGTGATGTTT 
CCTGTTTCTT 
CTGTGCTTTG 
TTTACAACGT 
TCCCCCTTTC 
GTTGCGCAGC 
CGGTATTTCA 
AAGCCAGCCC 
GGCATCCGCT 
ACCGTCATCA 
TAATGTCATG 
CGGAACCCCT 
ATAACCCTGA 
CCGTGTCGCC 
AACGCTGGTG 
ACTGGATCTC 
GATGAGCACT 
AGAGCAACTC 
CACAGAAAAG 
CATGAGTGAT 
AACCGCTTTT 
GCTGAATGAA 
AACGTTGCGC 
AGACTGGATG 
CTGGTTTATT 
ACTGGGGCCA 



CAGCGATCTC 
CGATTTGGAA 
GCATCAGCCG 
GTACACCGAC 
CTTTGATCGC 
CTCGCAAGGC 
ACCGAAGTCG 
ACCGCAGCAG 
AGCCTCGGTG 
TTTGTGATGT 
TGATCGATCT 
TGTATAAGAT 
TGACTCCAAA 
GAAGAGTTCA 
TAACATAACA 
CGTGACTGGG 
GCCAGCTGGC 
CTGAATGGCG 
CACCGCATAT 
CGACACCCGC 
TACAGACAAG 
CCGAAACGCG 
ATAATAATGG 
ATTTGTTTAT 
TAAATGCTTC 
CTTATTCCCT 
AAAGTAAAAG 
AACAGCGGTA 
TTTAAAGTTC 
GGTCGCCGCA 
CATCTTACGG 
AACACTGCGG 
TTGCACAACA 
GCCATACCAA 
AAACTATTAA 
GAGGCGGATA 
GCTGATAAAT 
GATGGTAAGC 



TTTGATGTGC 
ACGGCAGAGA 
ATTATCATCA 
ATGTGGAGTG 
GTCAGCGCCG 
ATATTGCGCG 
GCGGCTTTTC 
GGAGGCAAAC 
GGGAATTGGA 
TGGTGGCGTA 
TTGGGGTTTT 
GTGTTCATTC 
AACCAAAATC 
TCTACCATTC 
ATTGTTCACG 
AAAACCCTGG 
GTAATAGCGA 
AATGGCGCCT 
GGTGCACTCT 
CAACACCCGC 
CTGTGACCGT 
CGAGACGAAA 
TTTCTTAGAC 
TTTTCTAAAT 
AATAATATTG 
TTTTTGCGGC 
ATGCTGAAGA 
AGATCCTTGA 
TGCTATGTGG 
TACACTATTC 
ATGGCATGAC 
CCAACTTACT 
TGGGGGATCA 
ACGACGAGCG 
CTGGCGAACT 
AAGTTGCAGG 
CTGGAGCCGG 
• CCTCCCGTAT 



TGTGCCTGAA 
AGGTACTGGA 
CCGAATACGG 
AAGAGTATCA 
TCGTCGGTGA 
TTGGCGGTAA 
TGCTGCAAAA 
AATGAATCAA 
GAGCTCTGAG 
TTTTGTTTAA 
ATTTAACACA 
TTCGGTTGCC 
ACAACTCAAT 
CAGTTGGCAT 
GCATATATCC 
CGTTACCCAA 
AGAGGCCCGC 
GATGCGGTAT 
CAGTACAATC 
TGACGCGCCC 
CTCCGGGAGC 
GGGCCTCGTG 
GTCAGGTGGC 
ACATTCAAAT 
AAAAAGGAAG 
ATTTTGCCTT 
TCAGTTGGGT 
GAGTTTTCGC 
CGCGGTATTA 
TCAGAATGAC 
AGTAAGAGAA 
TCTGACAACG 
TGTAACTCGC 
TGACACCACG 
ACTTACTCTA 
ACCACTTCTG 
TGAGCGTGGG 
CGTAGTTATC 



3420 _ 

3480 

3540 

3600 

3660 

3720 

3780 

3840 

3900 

3960 

4020 

4080 

4140 

4200 

4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 
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TACACGACGG GGAGTCAGGC AACTATGGAT GAACGAAATA GACAGATCGC TGAG&TAGGT 5700 
GCCTCACTGA TTAAGCATTG GTAACTGTCA GACCAAGTTT ACTCATATAT ACTTTAGATT 5760 
GATTTAAAAC TTCATTTTTA ATTTAAAAGG ATCTAGGTGA AGATCCTTTT TGATAATCTC 5820 
ATGACCAAAA TCCCTTAACG TGAGTTTTCG TTCCACTGAG CGTCAGACCC CGTAGAAAAG 5880 
ATCAAAGGAT CTTCTTGAGA TCCTTTTTTT CTGCGCGTAA TCTGCTGCTT GCAAACAAAA 594 0 
AAACCACCGC TACCAGCGGT GGTTTGTTTG CCGGATCAAG AGCTACCAAC TCTTTTTCCG 6000 
AAGGTAACTG GCTTCAGCAG AGCGCAGATA CCAAATACTG TCCTTCTAGT GTAGCCGTAG 6060 
TTAGGCCACC ACTTCAAGAA CTCTGTAGCA CCGCCTACAT ACCTCGCTCT GCTAATCCTG 6120 
TTACCAGTGG CTGCTGCCAG TGGCGATAAG TCGTGTCTTA CCGGGTTGGA CTCAAGACGA 6180 
TAGTTACCGG ATAAGGCGCA GCGGTCGGGC TGAACGGGGG GTTCGTGCAC ACAGCCCAGC 624 0 
TTGGAGCGAA CGACCTACAC CGAACTGAGA TACCTACAGC GTGAGCATTG AGAAAGCGCC 6300 
ACGCTTCCCG AAGGGAGAAA GGCGGACAGG TATCCGGTAA GCGGCAGGGT CGGAACAGGA 6360 
GAGCGCACGA GGGAGCTTCC AGGGGGAAAC GCCTGGTATC TTTATAGTCC TGTCGGGTTT 642 0 
CGCCACCTCT GACTTGAGCG TCGATTTTTG TGATGCTCGT CAGGGGGGCG GAGCCTATGG 6480 
AAAAACGCCA GCAACGCGGC CTTTTTACGG TTCCTGGCCT TTTGCTGGCC TTTTGCTCAC 654 0 
ATGTTCTTTC CTGCGTTATC CCCTGATTCT GTGGATAACC GTATTACCGC CTTTGAGTGA "6600 
GCTGATACCG CTCGCCGCAG CCGAACGACC GAGCGCAGCG AGTCAGTGAG CGAGGAAGCG .6660 
GAAGAGCGCC CAATACGCAA ACCGCCTCTC CCCGCGCGTT GGCCGATTCA TTAATGCAGC .6720 
TGGCACGACA GGTTTCCCGA CTGGAAAGCG GGCAGTGAGC GCAACGCAAT TAATGTGAGT /6780 
TAGCTCACTC ATTAGGCACC CCAGGCTTTA CACTTTATGC TTCCGGCTCG TATGTTGTGT 684 0 
GGAATTGTGA GCGGATAACA ATTTCACACA GGAAACAGCT ATGACCATGA TTACGCCA 6898 
(2) INFORMATION FOR SEQ ID NO: 28: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

<xi) SEQUENCE DESCRIPTION : SEQ ID NO:28: 

CAGATCTGCA GATCTGCATG GGCGATG 27 
(2) INFORMATION FOR SEQ ID NO: 29: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
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GGGGACTCTA GAGGATCCCC GGGTGGTCAG TCCCTT 
(2) INFORMATION FOR SEQ ID NO: 30: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
GAATTTCCCC 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 31: 
GATCCGGATC CG 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TCGACGGATC CG 

(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
GGGGACTCTA GAGGATCCCG AATTTCCCC 
(2) INFORMATION FOR SEQ ID NO: 34: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

GATCCAGCTG AAGGCTCGAC AAGGCAGATC CACGGAGGAG CTGATATTTG GTGGACA 57 
(2) INFORMATION FOR SEQ ID NO: 35: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

( D ) TOPOLOGY : 1 inear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

AGCTTGTCCA CCAAATATCA GCTCCTCCGT GGATCTGCCT TGTCCAGCCT TCAGCTG 57 
(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 

AGCTGTGGAT AGGAGCAACC CTATCCCTAA TATACCAGCA CCACCAAGTC AGGGCAATCC 60 
CGGG 64 
(2) INFORMATION FOR SEQ ID NO: 37: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 64 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

TCGACCCGGG ATTGCCCTGA CTTGGTGGTG CTGGTATATT AGGGATAGGG TTGCTCCTAT 60 
CCAC 64 
(2) INFORMATION FOR SEQ ID NO: 38:. 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

CCGGGCCATT TGTTCCAGGC ACGGGATAAG CATTCAGCCA TGGGATATCA AGCTTGGATC 
CC 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

TCGAGGGATC CAAGCTTGAT ATCCCATGGC TGAATGCTTA TCCCGTGCCT GGAACAAATG 
GC 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO:40: 
GATATCAAGC TTGGATCCC 
(2) INFORMATION FOR SEQ ID NO: 41: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
CGGTACCTCG AGTTAAC 

(2) INFORMATION FOR SEQ ID NO: 42: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
CATGGTTAAC TCGAGGTACC GAGCT 
(2) INFORMATION FOR SEQ ID NO: 43: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE : DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 
ATCTGCATGG GTG 

(2) INFORMATION FOR SEQ ID NO: 44: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
GGGGACTCTA GAGGATCCAG 
(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 
GTTAACTCGA GGTACCGAGC TCGAATTTCC CC 
(2) INFORMATION FOR SEQ ID NO:46: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
GAGTTCAGGC TTTTTCATAG CT 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: . . 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
AGATCTCGTG AGATAATGAA AAAG 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 

ACTCGCCGAT AGTGGAAACC GACGCCCCAG CACTCGTCCG AGGGCAAAGG AATAGTAAGA 
GCTCGG 

(2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 70 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

GATCCCGAGC TCTTACTATT CCTTTGCCCT CGGACGAGTG CTGGGGCGTC GGTTTCCACT 
ATCGGCGAGT 

(2) INFORMATION FOR SEQ ID NO: 50: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 88 base .pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 

CTGCAGGCCG GCCTTAATTA AGCGGCCGCG TTTAAACGCC CGGGCATTTA AATGGCGCGC 60 
CGCGATCGCT TGCAGATCTG CATGGGTG 88 
(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

GACGGATCTG 10 
(2) INFORMATION FOR SEQ ID NO: 52: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

TGAGATCTGA GCTCGAATTT CCCC 24 
(2) INFORMATION FOR SEQ ID NO: 53: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 

GGTACCCCCG GGGTCGACCA TGG 24 
(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
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(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
GGGAATTGGA GCTCGAATTT CCCC 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
GGGAAATTAA GCTT 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 69 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

AGCGGCCGCA TTCCCGGGAA GCTTGCATGC CTGCAGAGAT CCGGTACCCG GGGATCCTCT 
AGAGTCGAC 

(2) INFORMATION FOR SEQ ID NO: 57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 54 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(Xi) SEQUENCE DESCRIPTION : SEQ ID NO:57: 

GGTACCCCCG GGGTCGACCA TGGTTAACTC GAGGTACCGA GCTCGAATTT CCCC 
2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 
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(D) TOPOLOGY: linear 
(it) MOLECULE TYPE: DNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 
GGGAATTGGT TTAAACGCGG CCGCTT 
2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 

CCATGCATGG 
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We claim 

1 An isolated DNA molecule selected from the following per 5 promoter 
sequences 

bp 4086-4148 of SEQ ID NO 1, 
bp 4086 to 4200 of SEQ ID NO 1, 
bp 4086 to 4215 of SEQ ID NO 1, 
bp 3187-4148 of SEQ ID NO 1, 
bp 3187-4200 of SEQ ID NO 1, 
bp 3 1 87-42 1 5 of SEQ ID NO 1 , 
bp 2532-4148 of SEQ ID NO 1, 
bp 2532-4200 of SEQ ID NO 1, 
bp 2532-4215 of SEQ ID NO 1, 
bp 1-4148 of SEQ ID NO 1, 

bp 1-4200 of SEQ ID NO 1, and 
bp 1-4215 of SEQ ID NO 1, 

or a fragment, genetic variant or deletion of such a sequence which retains the ability 
of functioning as a promoter in plant cells. 

2. An isolated DNA molecule selected from the following per 5 intron sequences 

bp 4426-5058 of SEQ ID NO 1, 
bp 4420-5064 of SEQ ID NO 1, 
bp 5251-5382 of SEQ ID NO 1, 
bp 5245-5388 of SEQ ID NO 1, 
bp 5549-5649 of SEQ ID NO 1, and 
bp 5542-5654 of SEQ ID NO 1 . 

3 . An isolated DNA molecule corresponding to the per5 transcription 
termination sequence and having the sequence of bp 6068-6431 of SEQ ID NO 1. 
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4. An isolated DNA molecule having a 20 base pair nucleotide portion identical 
in sequence to a consecutive 20 base pair portion of the sequence set forth in SEQ ID NO 1. 

5. A recombinant gene cassette competent for effecting preferential expression 
of a gene of interest in a selected tissue of transformed maize, said gene cassette comprising: 

a) a promoter operable in maize; 

b) an untranslated leader sequence; 

c) the gene of interest; 

d) a 3'UTR; 

said promoter, untranslated leader sequence, gene of interest, and 3'UTR being operably 
linked from 5 1 to 3*; and 

e) an intron sequence that is incorporated in said untranslated leader sequence,in 
said gene of interest, or in said 3'UTR, said intron sequence being from an intron of a maize 
gene that is preferentially expressed in said selected tissue, and said intron sequence being 
from a gene other than the gene of interest. 

6. A recombinant gene cassette of claim 5 wherein the promoter is from a first 
maize gene, said first maize gene being one that is naturally expressed preferentially in the 
selected tissue. 

7. A recombinant gene cassette of claim 5 wherein said intron sequence is 
incorporated in said untranslated leader. 

8. A recombinant gene cassette of claim 5 wherein said selected tissue is root 

tissue. 

9. A recombinant gene cassette of claim 8 wherein said intron sequence is 
comprised of bp 4420 to bp 5064 of SEQ ID NO 1. 

10. A recombinant gene cassette of claim 5 wherein said promoter is a per 5 
promoter comprised of bp 2532-4148 of SEQ ID NO 1. 

11. A recombinant gene cassette of claim 10 wherein said promoter is a per 5 
promoter comprised of bp 1 -4 1 48 of SEQ ID NO 1 . 
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12. A recombinant gene cassette of claim 5 wherein the 3UTR-is aper5 3TJTR . 
comprised of bp 6068 to bp 6431 of SEQ ID NO 1. 

13. A recombinant gene cassette competent for effecting constitutive expression 
of a gene of interest in transformed maize comprising: 

a) a promoter from a first maize gene, said first maize gene being one that is 
naturally expressed preferentially in a specific tissue; 

b) an untranslated leader sequence; 

c) the gene of interest, said gene being one other than said first maize gene; 

d) a 3TJTR; 

said promoter, untranslated sequence, gene of interest, and 3'UTR being operably linked 
from 5' to 3'; and 

e) an intron sequence that is incorporated in said untranslated leader or in said 
gene of interest, said intron sequence being from an intron of a maize gene that is naturally 
expressed constitutively. 

14. A recombinant gene cassette of claim 1 3 wherein said intron is the Adhl 
intron 1 or an operative portion thereof. 

15. A recombinant gene cassette of claim 1 4 wherein said promoter is a per 5 
promoter comprised of bp 2532 to 4148 of SEQ ID NO 1, or an operative portion thereof. 

16. In a recombinant gene cassette for effecting expression of a gene of interest in 
a transformed plant cell wherein said gene cassette is comprised of: 

a promoter; 

an untranslated leader sequence; 

the gene of interest, said gene of interest being a gene other than per 5; and 
a 3'UTR; 

the improvement wherein said 3'UTR is aperS 3'UTR comprised of bp 6068 to 6431 
of SEQ ID NO 1. 

17. A recombinant gene cassette of claim 1 6 wherein said promoter is selected 
from the group consisting of the 35T promoter, the ubiquitin promoter, and the per5 
promoter comprising bp 2532 to 4148 of SEQ ID NO 1 . 
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1 8. A DNA construct comprising, operatively linked in the 5' to 3' direction, 

a) a promoter comprising bp 4086-4148 bp of SEQ ID NO 1 ; 

b) an untranslated leader sequence, 

c) a gene of interest not naturally associated with said promoter; 

d) a 3'UTR. 

19. A DNA construct of claim 1 8 wherein the promoter and untranslated leader 
sequence together comprise bp 4086-4200 of SEQ ID NO 1 . 

20. A DNA construct of claim 1 8 wherein the promoter is comprised of bp 3 1 87- 
4148 of SEQ ID NO 1. 

21 . A DNA construct of claim 18 wherein the promoter is comprised of bp 2532- 
4148 ofSEQIDNO 1. 

22. A DNA construct of claim 1 8 wherein the promoter is comprised of bp 1- 
4148 of SEQ ID NO 1. 

23. A DNA construct of claim 18 wherein said 3'UTR is the nos 3'UTR. 

24. A DNA construct of claim 1 8 wherein said 3'UTR has the sequence of bp 
6066-6550 ofSEQIDNO 1. 

25. A DNA construct comprising, operatively linked in the 5 f to 3' direction, 

a) a promoter comprised of bp 4086-4 1 48 bp of SEQ ID NO 1 ; 

b) an intron selected from the group consisting of Adhl intron 1 and bp 4426- 
5058ofSEQIDNOl; 

c) a gene of interest not normally associated with said promoter; 

d) a 3'UTR. 

26. A DNA construct of claim 25 wherein said 3'UTR is selected from the group 
consisting of nos and bp 6067-6340 of SEQ ID NO 1. 

27. A DNA construct of claim 25 wherein said 3'UTR is selected from the group 
consisting of nos and bp 6067-6439 of SEQ ID NO 1. 

28. A DNA construct comprising, in the 5' to 3' direction, 
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a) a promoter having as at least part of its sequence bp 4086-4148 bp of SEQ ID 
NOl; " 

b) an intron selected from the group consisting of Adhl intron 1 and bp 4426- 
5058ofSEQIDNOl; 

c) a cloning site; 

d) a 3'UTR. 

29. A DNA construct of claim 28 wherein said S'UTR is selected from the group 
consisting of nos and bp 6067-6340 of SEQ ID NO 1. 

30. A plasmid including a promoter that is comprised of bp 4086-4148 of SEQ ID 

NO L 

31. A plasmid of claim 30 wherein the promoter is comprised of bp 3 1 87-4148 of 
SEQ ID NOl. 

32. A plasmid of claim 30 wherein the promoter is comprised of bp 2532-4148 of 
SEQ ID NOl. 

33. A plasmid of claim 30 wherein the promoter is comprised of bp 1-4148 of 
SEQ ID NOl. 

34. A plasmid comprising a recombinant gene cassette of claim 5. 

35. A plasmid comprising a DNA construct of claim 18. 

36. A transformed plant comprising at least one plant cell that contains a 
recombinant gene cassette according to claim 5. 

37. A transformed plant comprising at least one plant cell that contains a DNA 

construct according to claim 18. 

38. Seed or grain that contains a recombinant gene cassette of claim 5 . 

39. Seed or grain that contains a DNA construct of claim 1 8. 

40. A method for expressing a gene of interest preferentially in a selected tissue 
which comprises transforming maize with a gene cassette of claim 5. 

41 . A method for expressing a gene of interest in maize preferentially in root 
tissue which comprises transforming maize with a gene cassette of claim 5 wherein the 
selected tissue is root tissue. 
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42. A method of claim 41 wherein the intron sequence in the gene cassette is 
comprised of bp 4420 to 5064 of SEQ ID NO 1. 

43. A method of claim 40 wherein the promoter in the gene cassette is a per 5 
promoter comprised of bp 2532 to 4148 of SEQ ID NO 1, or an operative portion thereof. 
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